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RESTRICTED AMPLICON ANALYSIS 

Field of the Invention 

The present invention generally provides a method which facilitates the 
detection of polymorphisms (or mutations). The method is directed to the analysis of 
so-called endonuclease site polymorphisms (ESPs) that result in the gain or loss of a 
restriction endonuclease site. In essence, the ESP is probed with the restriction 
endonuclease reagent prior to amplification, whereby amplification is prevented and 
consequently no signal is observed when cleavage takes place. Unambiguous allele 
calling is performed by comparing the signals obtained with and without cleavage with 
the restriction endonuclease reagent. The method is particularly useful for multiplex 
genotyping, involving the parallel analysis of large numbers of single nucleotide 
polymorphisms. Preferred methods for detecting the amplicons involve hybridization 
to an arrayed or otherwise identifiable set of cognate probe fragments or 
oligonucleotides. 



Rackgrnund of the Invention 

Molecular approaches for gmetic analyses trace the nucleotide sequence 
variation that occurs naturally and randomly in the genomes of all living species. 
Knowledge of the DNA polymorphisms among individuals and between populations 
is important in understanding the complex links between genotypic and phenotypic 
variation. In the absence of complete data about sequence variation, one relies on the 
ability to identify * nearby* markers that allow to infer the location of certain relevant 
loci or caiisal sequence variations. The informativeness of the marker depends on the 
magnitude of the linkage disequilibrium. Markers can be used in linkage studies to 
search for candidate genes and in association studies to identify the functional allelic 
variation on candidate genes that influence inter-individual variation. 

The vast majority of sequence variation consists of nucleotide 
substitutions, often referred to as single nucleotide polymorphism's (SNPs), resulting 
from mutations that have accumulated during evolution. Most of these nucleotide 
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changes are genetically silent; i.e., they have no measurable biological effect, but 
provide an immense reservoir of variation in DNA structure. Most methods for genetic 
analysis used today rely on the detection of nucleotide sequence variation which can 
be measured by DNA fragment analysis using electrophoretic sqiaration, in which 
DNA fragments are fractionated based on size or conformation. Occasionally the 
nucleotide sequence variation will affect either the presence of the DNA fragment or 
its mobility. In this way the primary nucleotide sequence variation will give rise to 
easily detectable DNA fragment polymorphism. Since polymorphic DNA fragments 
are derived from predse locations on the orpnism's genome, they can serve as reliable 
genetic markers, or landmarks to identify and locate genes. 

A host of assays to detect DNA polymorphisms, and SNPs in particular, 
have been developed. In some of tiiese assays (e.g. , RFLP [Botstdn, D. , White, R.L. , 
Skolnich, M., Davis, R.W., Am. J. Hum. Genet. 32:314-331 (1998)], CAPS 
[Konieczny, A. Ausubd, J.F., PlamJ. 4:403-410 (1993)], dCAPS [Neff, M.M. Neff, 
J.D., Chory, J., Pepper, A.E., The Plant Journal 14:387-392 (1998)], PIRA 
[Steinbom, R., Muller, M., Bran, G., Biochim, mophys. Acta 13S>7:295-304 (1998)]), 
restriction enzymes are used to detect polymorphic nucleotide sequences that affect 
cleavage. The specificity of restriction enzymes is such tiiat they exhibit a unique 
sensitivity to detect single nucleotide differences occurring in their recognition sites. 
The princqjal strengths of restriction aizyme-based genetic analyses are the ease of use 
and the robustness of tiie assays. In the majority of tiie cases, tiie restriction site 
polymorphism is used to detect known, previously identified SNPs and the assay 
consists of any electrophoretical fragment analysis. In one report, the allelic variation 
is detected in a soKd-phase EUSA-type setting [Truett, G.E., Walker, J. A., WHson, 
J.B., Redmann, S.M. Jr., IWIey, RT., Eckaidt, G.R., Plastow, G., Mamm. Genome 
9:629-632 (1998)]. 

In WO 91/17269, Lenier et al. describe a different metiiod for mapping 
a eukaryotic chromosome by restriction endonuclease m^ing of discrete DNA 
sequences which are complementary to a region of a eukaryotic chromosome. 

Vos et al, Nucl Acids Res. 23:4407-4414 (1995) and EP 0 534 858 
describe a technique for DNA fingerprinting called AELP which is based on the 
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selective polymerase chain reaction based application of restriction fragments of a 
digest of genomic DNA. The application reaction depends on the use of primers that 
extend into restriction fragments amplifying only those ftagments in which prior 
extensions match the nucleotide sequence flanking the restriction sites. 

Another method utilizing DNA amplification steps is set out in Williams 
et aLy NucL Acids Res. 18:6531-6535 (1990), who describe a DNA fingerprinting 
method termed random amplified polymorphic DNA. 

DNA amplificadon fingerprinting was described by Caetano Anolies in 
Bio/Technology 9:553-557 (1991). Still another fingerprinting technique called 
arbitrarily primed PCR was described in Welsh et cd. , Nucl Acids Res. 18:7213-7218 
(1990) and Welsh et al, NucL Acids Res. 19:861-866 (1991). 

In WO 94/11530, Cantor et al describe materials and methods for 
position and sequencing by hybridization. Cantor et ah also describe methods for 
creating assays of DNA probes useful in the practice of their method. 

The major shortcoming of the current methods of genetic analysis is the 
limited resolution of the DNA fiagment analysis systems, namely the number of DNA 
fragments that can be separated in a single assay. Generally the ftactionation resolution 
ranges from toas to a couple of hundred DNA fragments, at the most. Consequently, 
current genetic analysis methods are limited to a few hundred to a thousand genetic 
markers. While this resolution has been sufficient for analyzing simple genetic traits 
determined by single genes, the analysis of complex traits, which is now being 
undertaken and which involve general or many different graes, will require the analysis 
of a much larger number of genetic markers. It is anticipated that such studies will 
require from a few thousand to possibly several hundred thousand genetic markers. 
Although this could conceivably be accomplished by performing many parallel assays, 
such scaling up will be cost- and labor prohibitive. 

A technology that has great potential and which is generating widespread 
interest in the so-called micro-array technology (DNA chips). In general, these 
methods are based on measurement of the hybridization of DNA sequences in solution 
to probe sequences that are arrayed on a solid surface. When assaying nucleotide 
polymorphisms, the detector relies on the small differences in hybridization efficiency 
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between two different DNA sequences. In one fonnat, fluorescentiy labeled sample 
DNA is hybridized to dense arrays of probe nucleic acids, sequence-specific 
hybridization signal is detected by scanning confocal microscopy, and DNA variants 
scored as (predictable) differences in the hybridization pattern. The micro-arrays are 
fabricated either by in-situ light-directed oligonucleotide synthesis [Fodor, S.P.A. et 
al. Science 251: 767 (1991)] or by spotting DNA (off-chip synthesized 
oligonucleotides or PCR fragments) in an automated procedure. The technology has 
ah«ady been demonstrated in the scoring of mutations in mitochondrial DNA [Chee, 
M. et al. Science 21 A: 610-614 (1996)], the mv genome [Lipshutz, R.J. et al, 
Biotechruques 19: 442-447 (1995)], the CFIH cystic fibrosis gene [Cronin, M.T. et 
al.. Human Mut.7: 244-255 (1996)], the BRCAl breast cancer gene (Hacia, G.H. et 
al, Nat. Genet. 14: 441-447 (1996)] as well as the entire yeast genome [Winzeler, 
E.A. et al., Science 281:1194 (1998)]. In comparison with most other assays, micro- 
arrays provide a platform for high-thioughput, massively parallel polymorphism 
detection. 

A major disadvantage with the use of microarrays relates to the 
complexity of the hybridization reaction. The detection relies on the very small 
difference in hybridization of DNA sequences differing by only one nucleotide. In 
general, a set of 4 oligonucleotides, differing only in the identity of the central base, 
is synthesized for each position in the target sequence that has to be interrogated. In 
practice, the number of oligonucleotides needed to conectly genotype one SNP is much 
larger, involving up to 56 different oligonucleotides spanning the variable base [Wang 
et al. , Science 280: 1077-1082 (1998)]. Hie degree of redundancy is also dramatic if 
one wants to screen the target DNA for all possible mutations; the design then includes 
overiapping oKgonucleotide-sets that are offset by one base (a process known as tiling). 
It should be noted that the detection of SNPs by hybridization to arrays depends on the 
use of short oUgonucleotide probes. With longer probes such as DNA fragments in the 
size range of 50 to 500 base pairs or larger, it is not possible to distinguish the SNP 
alleles. 
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Siiminarv nf fho TY^yfnffffn 
The present invention is directed to methods for genotyping 
polymorphisms that result in the gain or loss of an endonuclease cleavage site. Such 
polymorphisms are referred to herranafter as endonuclease site polymorphisms (ESPs). 
Polymoiphisms detectable according to the methods of the present invention include 
single nucleotide polymoiphisms (SNPs). Tb& methods of the present invaition e3q)loit 
die high discriminatory powra- of restriction enzymes in a "Restricted Amplicon Assay" 
(RAA) which generally comprises the following steps (see Figure 1): 

(a) isolating sample DNA; 

(b) deriving;^ set of target DNA fragments, said set of target 
fragments comprising concomitantly amplifiable target DNA fragments from the 
sample DNA; 

(c) treating the target DNA fragments obtained in step (b) a probe 
restriction endonuclease reagrat; 

(d) amplifying the an^lifiable probe restriction endonuclease reagent 
treated target DNA fragmaits of stcp(c); and 

(e) analyzmg the DNA of step (d) to determine which target 
firagmrats are amplified and/or which target fragments are not amtplified; and wherem 
ansphSBd target fragments lack a recognition site for the probe restriction endonuclease 
reagent and target ftagments having a recognition site for a probe restriction 
endonuclease reagent are not amplified. 

In one aspect, the presait invention is directed to RAA-methods, which 
comprise the preparation of concomitantiy amplifiable DNA segments by digestion of 
the ststrting DNA with one or more restriction endonucleases, collectively referred to 
herein as sampling enzymes. This method is herein referred to as foimat-I RAA and 
is diagrammed in Figure 2. The digested starting DNA may be fiirtiier modified at its 
termini by the addition of ad^ters, which may serve to prime an amplification reaction 
(see Figure 2). Once sample DNA is obtained, it is treated with a different restriction 
enzyme, the probing enzyme also referred to as a probe restriction endonuclease 
reagent. A combination of probing and sampling enzymes are chosen such that a 
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substantial fraction of the sample fragments contain a single recognition site for the 
probe endonuclease reagent. In general, probe enzymes used with fonnat-I RAA 
preferably have as a recognition site a nucleotide sequence of less than six nucleotides. 

In another aspect, the present invention is directed to methods for 
fomiat-n RAA for the detection of ESPs, as diagrammed in Figure 3 , Format-II RAA 
operates on the same princq)al as format-I RAA except tiiat the sample amplicons need 
not be DNA fragments, but are rather defined regions of a genome amplifiable with 
specific primer pairs. The amplicons of the format-II RAA are identified on the basis 
of sequence data; e.g. the sequence of ESP-containing restriction fragments identified 
using fonnat-I RAA method or otherwise known SNPs affecting endonuclease cleavage 
sites. In format-II RAA, the test DNA to be analyzed is treated with a probe restriction 
endonuclease reagent, followed by the concomitant amplification of regions of the 
treated DNA (amplicons) using predetermined primers using, for example, the 
polymerase chain reaction as described herein. The analysis of the amplification 
products then proceeds as described in the foimat-I RAA methods described herein. 
As with format-I RAA, an ESP is genotyped by the presence or absence of a 
recognition site for the probe restriction endonuclease reagent. 

In yet another aspect, the present invention is directed to methods for 
format-m RAA. In essence, foimat-m RAA consists of a combination of the fonnat-I 
and foTmat-n ^roaches. One of such combinations is diagrammed in Figure 4. Test 
DNA, digested or not with a probe endonuclease reagent, is sampled with a pair of 
endonuclease reagents and the resulting fragments are co- as described in the format-I 
assay amplified (this stqp is referred to as the pie-amplification step). These pie- 
amplification mixtures are, in turn, used as templates for a format-II type of PGR 
reaction in which multiple ESP-containing regions are selectively co-amplified using 
specific primer sets. The analysis of the amplification products tiien proceeds as 
described before. The advantages of format-m RAA are that tiie stepwise 
amplification faciHtates tiie multiplex PGR of ttie ESP-specific amplicons and lowers 
the amount of starting material required to interrogate all the ESPs. 

Arrays, or microarrays of probe DNA wherein the probe DNAs are 
useful in the detection of ESPs are also encompassed by the present invention. 
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Informadve probe DNAs are prq>ared and identified as described in detail below and 
are then attached to a substrate for use in the hybridization reactions with concomitantly 
amplifiable DNA after treatment with a probe restriction endonuclease reagent and 
subsequent amplification. 

Since the method of the invention is based on the detection of a 
particular kind of DNA polymorphism, which occurs in DNA of any organism, the 
invention will be universally applicable. The methods of the present invention may be 
used to genotype ESPs in a wide variety of organisms from prokaiyotic organisms, 
such as bacteria, through complex eukaryotic organisms, viruses, or any organism 
having a genome however simple or complex. The methods may also be used for the 
analysis of extrachromosomal DNA, the DNA found in certain cellular organelles, 
cDNA preparations, or DNA libraries, such as yeast artificial chromosome libraries 
and others* Furthermore, based on the large body of DNA sequence data at hand, it 
is predicted that the genomes of higher organisms carry several hundreds of thousands 
of such DNA polymorphism. Consequently, the new method is capable of diagnosing 
the inunense number of genedc marters that are needed to unravel complex traits. The 
method is of tremendous value for high throughput genetic analysis in the emerging 
field of phannacogenomics. Similarly, the method has great potential in the field of 
animal and plant breeding, where high resolution genetic analysis will be needed to 
identify the genes involved in quantitative agronomic traits. 

Various aspects of the present invention are described in more detail 
below {see Detailed Descr5)tion of the Invention). Variations in each of these aspects 
wiU be readily appreciated by one of ordinary skill in the art and one with the scope 
of the invention. 

Brief Description nf the nrawii^g^ 
Figure 1 depicts the general concept of the Restricted Amplicon Assay. 
The vertical arrows indicate the positions of the ESPs. The open circles denote the 
probmg enzyme sites that are present, while the closed circles denote the mutated sites. 
The first step involves cleavage of the test DNA with the probing endonuclease. The 
second step involves PGR amplification of DNA segments comprising the ESPs. The 
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small horizontal arrows denote the PCR primers flanking the ESPs. When cleavage 
occurs the DNA is cut between the PCR primers, preventing the subsequent 
amplification of the DNA segment comprising those ESPs. Only those DNA segments 
that were not cleaved are amplified. The final step comprises assaying the amplicons. 

Figure 2: Diagrammed representation of format-I RAA. The vertical 
arrows indicate the positions of the ESPs, with the open and closed circles denoting the 
probing enzyme sites that are respectively present and absent. Step 1 represents the 
sampling enzyme cleavage step. The vertical dotted arrows indicate the positions of the 
sampling enzyme cleavage sites. Step 2 represents the adapter ligation step. The open 
lines represent the adapters ligated to the ends of the sampled restriction fragments. 
Step 3 represents the probing enzyme cleavage step and- the small horizontal arrows 
denote the PCR primers matching the adapter sequences. Step 4 represents the PCR 
amplification stq) in which only the sample fragments that are not cleaved by the 
probing enzyme are amplified. Hie crossed circles represent the fragments that are not 
amplified. 

Figure 3: Diagranamed representation of format-II RAA. The vertical 
arrows indicate the positions of the ESPs, with the open and closed circles denoting the 
probing enzyme sites that are respectively present and absent. Step 1 rqjresents the 
probing enzyme cleavage step. The dotted boxes denote the DNA sequences flanking 
the ESP sites. Step 2 r^resents the PCR primer design. The small horizontal arrows 
denote the PCR primers flanking the ESPs St^ 3 rqjresents the PCR amplification step 
in which only the sample fragments that are not cleaved by the probing enzyme are 
amplified. The crossed circles represent the fragments that are not amplified. 

Figure 4: Diagrammed representation of format-IH RAA. The vertical 
arrows indicate the positions of the ESPs, with the open and closed circles denoting the 
probing enzyme sites tliat are respectively present and absent. Step 1 represents the 
san:q)ling enzyme cleavage step. Ihe vertical dotted arrows indicate the positions of the 
sampling enzyme cleavage sites. Step 2 represents the pre-amplification step in which 
the sampled fi^gments are amplified. St^ 3 represents the probing enzyme cleavage 
st^. Step 4 represents the PCR primer design. The small horizontal arrows denote the 
PCR primers flanking the ESPs. Step 5 represents the PCR amplification step in which 
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only the sample fragments that are not cleaved by the probing enzyme are amplified. 
The crossed circles represent the fragments that are not amplified. 

Figure 5: Graphic representation of target fragments produced by 
cleavage with a hexacutter (full arrows) and a tetracutter (dotted arrows) restriction 
enzyme. Two types of Augments are produced: type I fragments (dotted lines) carrying 
two tetracutter ends and type II fragments (full lines) carrying one hexacutter end 
(represented by the arrowhead) and one tetracutter end. Upon PCR amplification only 
the type I fragments are amplified. 

Figure 6: EcoRI-Bfal fragments from ecotypes Columbia (C) and 
Landsberg (L) obtained after selective amplification using EcoRI and Bfal AFLP 
primers with respectively 2 and 3 selective nucleotides' The fragment patterns were 
obtained respectively without probing enzyme (no enzyme) and after digestion with the 
Msel probing enzyme. It is noted that most of the larger fragments do not survive after 
Msel digestion, while the majority of the smaller fragments survive the treatment. The 
differences between the ecotypes Columbia (C) and Landsberg (L) observed after Msel 
digestion, marked by the arrows represent ESP carrying fragments. The differences 
found without Msel digestion, marked by the stars represent typical AFLP 
polymorphisms. 

Figure 7: Hybridization panems obtained on the Arabidopsis micro- 
arrays. The layout of the Arabidopsis micro-array is as follows: the left panel contains 
the ESP fragment probes derived from Columbia (upper half) and Landsberg (lower 
half), while the right panel contains the control monomorphic probes with respectively 
the negative control fragments (-control) always carrying a probing endonuclease site 
and the positive control fragments (-h control) carrying no probing endonuclease site. 
The upper part of the figure shows the hybridization patterns obtained with uncleaved 
sample DNA, while the lower part of the figure shows the hybridization patterns 
obtained with cleaved sample DNA. The dark-grey circles code is as follows: light- 
grey circles represent hybridization with the Cy3-Iabeled fragments, dark-grey circles 
represent hybridization with the Cy5-labeled fragments, black circles represents 
hybridization with both the Cy3-labeled and the Cy5-labeled fragments, and open 
circles represent no hybridization. In this figure of a set of idealized results is 

RECTIFIED SHEET (RULE 91) 
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presented. The hybridization patterns with the uncleaved sample DNA shows that all 
probes detect sequences in both ecotypes, while the hybridization patterns with the 
cleaved sample DNA show that the ESP fragment probes detect only the sequences in 
the respective ecotypes from which the ESP fragments were isolated. In addition, 
fragments carrying no site for the probing enzyme, detect sequences in both ecotypes, 
while fragments that always carry a site for the probing enzyme do not show a 
hybridization signal. 

Figure 8: Hybridization patterns obtained on the com micro-arrays. The 
layout of the corn micro-array is as follows: the left panel of probes contains random 
fragments derived from B73, while the right panel contains Mol7-fragments. The 
figure shows four hybridization patterns obtained with respectively uncleaved sample 
DNA, Msel -cleaved, Tsp509I-cleaved and Alul-cleaved cleaved sample DNA. The 
uncleaved sample DNA hybridization pattern shows probes that hybridize only to B73 
(light-grey circles), respectively Mol7 (dark-grey circles) fragments, which represent 
polymorphisms resulting from mutations in the sample enzyme recognition sites. The 
cross in the circle indicates that these probes are eliminated from the analysis. The 
cleaved sample DNA hybridization patterns show that the majority of the probes do not 
give a hybridization signal, indicating that their cognate fragments are cleaved by the 
probing enzyme. Most of the probes giving a signal hybridize to both sample DNAs. 
Those that hybridize to only one of the sample DNAs and that were eliminated 
represent fragments carrying ESPs. The arrows denote the probes that were retained 
for further analysis. 

Detailed Description of the Invention 

The term "SNP" means Single Nucleotide Polymorphism, i.e. a 
polymoiphism involving the mutation of a single base-pair. 

The term "ESP" means Endonuclease Site Polymorphism, i.e. a 
polymorphism involving two alleles one of which is cleaved by an endonuclease 
reagent while the other exhibits (at least partial) resistance to cleavage by the same 
endonuclease under the same conditions. 

The phrase "(restriction) endonuclease reagent" refers to a reagent that 
consists of one or more enzymes and that cleaves nucleic acids with a certain 

rectified sheet (rule 91) 
isa;ep 
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Specificity, i.e. cleavage involves recognition of a particular sequence or set of 
sequences an the target DNA. Endonuclease reagents include but are not limited to the 
common type n restriction enzymes. 

The term "sampling endonuclease(s)" or "sampling enzyme(s)" refers to 
an endonuclease reagent used to derive sets of fragments from the sample DNA. 
The term ''probing endonuclease(s)" or "probing enzymeCs)" refers to an endonuclease 
reagent used to probe the allelic state at specific ESP-sites. 

The tOTn '^polymorphism'' refers to the existence of two or more alleles 
at significant frequencies (>1%) in the population; polymorphism at a single 
chromosomal location constitutes a genetic marker. 

The tOTn "micro-satellite (DNA)" refers to a small array (often less than 
0.1 kb) of tandem repeats of a very simple sequMce, often 1 to 4 base-pair. Variability 
at such a locus is the basis of many genetic markers. 

The term "mutation" means a heritable alteration in the DNA sequence. 

The term "allele" refers to one of several alternative sequence variants 
at a specific locus. 

The term "genotype" is commonly known to mean (i) the genetic 
constitution of an individual, or (ii) the types of allele found at a locus in an individual. 

The term ''haplotype" refers to the genotype at a series of linked loci on 
a single chromosome. 

The term "sample DNA" or "sample fragments" refers to the set of 
fragments or amplicons derived from the starting DNA by the RAA method. 

The term "zygosity" refers to the homozygous or heterozygous state. 

The term "homozygosity/homozygous" refers to the presence of identical 
alleles at a locus. 

The term "heterozygosity/heterozygous" refers to the presence of 
different alleles at a locus. 

The term "CpG" means a dinucleotide with a cytosine at the 5'-side and 
a guanine at the 3*-side. CpG is relatively rare in mammalian DNA because of the 
tendency for the cytosine to be methylated and subsequentiy mutate to thymine by 
deamination. 
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The term "ecotype" refers to a naturally occurring (plant) variety; race. 
The term Iji-allelic" refers to a polymoiphic locus characterized by two 
different alleles. 

The terms "microatray" and "{DNA-)chip" refer to a multitude of 
spatially addressable nucleic acids that serve as probes. The microarray may be used 
in the form of a planar solid support, a bead, a sphere, or a polyhedron. Fabrication 
is done either by in situ combinatorial synthesis of oligonucleotides using 
photolithography, or by robotic spotting of off-chip prepared DNA onto a solid 
surface. 

The methods of the present invention differs concq)tually from 
previously described restriction enzyme-dependent assays {suprd) that ess«itially detect 
a fiagmMit length polymorphism. With the present method, starting DNA is restricted 
prior to the amplification reaction and, rather than analyzing the obtained amplification 
pnxiuct, the presence or absence of amplification is measured to determine the allelic 
state at an ESP site. The treated DNA is preferably amplified by using a polymerase 
chain reaction and is preferably analyzed by means of hybridization against arrays of 
probe DNAs. With the present method, a sample-amplicon, and consequently a 
hybridization signal, is either present or virtually absent. This feature Tcpresents a 
major advantage in that it results in a more accurate distinction between variable 
nucleotides than is possible by differential hybridization to aUele-specific 
oligonucleotides, and because it greatly facilitates the identification of a set of generally 
useful hybridization conditions. Also, the methods of the invention permit the use of 
both oUgonucleotides as well as DNA ftagments as probe DNAs. While hybridization 
to arrays allows the simultaneous analysis of a large number of ESPs, it should be clear 
that the amplification of sample DNA, treated with probe restriction endonuclease 
reagait, can be analyzed by any of a variety of methods weU known in the art. In these 
methods, an ESP is identified either by the presence of a recognition site for the probe 
restriction endonuclease reagent (which wiU result in the failure of the sample DNA to 
amplify) or fay the loss of a recognition site which will allow amplification of an 
otherwise unamplifiable sample DNA. Alternative methods include, but are not limited 
to, gel-electrophoretic analysis, and the TaqMan assay [Holland P. M. et al. Proc. 
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Natl Acad. ScL 88: 7276-7280 (1991); with the latter assay detection is done during 

rather than after the amplification reaction] . 

One of the advantages of the method of the invention is the ability to 

calibrate the measured signal against that obtained in a control experiment where 

5 digestion with the probe restriction endonuclease reagent is omitted. Comparison of the 

respective hybridization signals, following various corrections and normalization 

procedures, is essential for the genotyping of ESPs and the accurate determination of 

the zygosity. The cleaved and uncleaved material can, in princq)le, be hybridized 

separately but a prefened method consists of hybridizing a mixture of the differentially 

10 labeled samples to the same array. The present invention is exemplified by several 

specific formats described below. 

(D Fonaat"I RAA; Choice of sampling and probing restriction 
endonuclease reagents. In one of its embodiments the present invention is directed 

to methods for detecting ESPs in a "restricted amplicon assay" (RAA) which comprises 

15 preparing concomitantly amplifiable restriction fragments from the starting DNA 
(sample DNA). Whm generating discrete sets of DNA fragments from genomic DNA, 
the following parameters are important the average fragment size and the total number 
of fragments. The optimal fragment size for use in the methods (and materials) of the 
present invention is a trade off; the fragments must be sufficiently small for 

20 amplification with roughly equal efficiency (in general <500 base pairs) and large 
enough for having on average one cleavage site for the probing endonuclease reagent. 
In addition to average fragment size, the number of fragments determine the 
complexity of the sample DNA which is critical in view of the limitations of the 
detection sensitivity of micro-array hybridization. In general, the current state of the 

25 art of-microarray hybridization is such that the number of sample fragments should not 
exceed 100,0CX). All of the above-mentioned requisites can be met by the appropriate 
choice of sampling and probing enzymes. A preferred method of the present invention 
to prepare sample DNAs (amplicons) involves the use of two different sampling 
enzymes, a rare cutter endonuclease {e.g. , hexacutter) combined with a frequent cutter 

30 endonuclease {e.g., tetracutter), as described in EP 0 534 858 Al which describes a 
method called AFLP and which is incorporated herein by reference. As can be seen 
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from Figure 5, the rare cutter enzyme produces large fragments that iqwn cleavage 
with the frequent cutter aizyme are cut into a number of smaller fragments. This dual 
cleavage generates two types of fragments: the majority having both ends produced by 
the frequent cutter (type I) and a minority of fragments having a rare cutter end and a 
frequent cutter end (type II). After Ugating different adapters to each of the ends and 
using appropriate primers tai^geted to the ends of the fragments, only the type E 
fragments will be amplified efficientty (see Figure 5). The type I fragments amplify 
with greatly reduced efficiency presumably because the synthetic sequences at the two 
ends constitute an inverted repeat. In general the type H fragments will amplify 
synchronously using a single PGR primer pair that attaches to the ends of the 
fragments. The size limit is typically around 500 base pairs, but can be increased by 
using a different DNA polymerase and other reaction conditions. Thus, as outlined 
above the number of amplifiable fragments will be deteimined primarily by the choice 
of the rare cutter restriction enzyme. By approximation, this number equals two times 
the number of cleavage sites for the rare cutter. In a preferred embodiment, restriction 
Mzymes recognizing 6 nucleotides (hexacutters) or more are used as rare cutters. The 
use of a frequent cutter recognizing 4 nucleotides (tetracutter) as second sampling 
enzyme results in the production of fragments in the optimal size range for co- 
amplification. As probe restriction endonuclease reagents, different tetracutter or 
pentacutter enzymes can be used. The probe restriction endonuclease reagent and the 
frequent cutter sampling enzyme should preferably be chosen such that the ratio of the 
cleavage frequencies of probing over sampling reagent is >0.5 and <3. This will 
ensure that a substantial fraction of the target fragments are cleaved once by the 
probing enzyme. It is noted that ESPs cannot be genotyped when the fragments are 
cleaved more than once by the probing enzyme. Also, it should be recognized that 
cleavage with the probe restriction endonuclease reagent results in a significant 
reduction (typically 2-4 fold) of the fragment complexity. 

Alternative schemes - different fiom the one described above - that meet 
the requisites of sample complexity, average fragment size, and occuircnce frequency 
of the probe reagent and that will perform equally well, will be readily apparent to one 
of ordinary skill in the art. Alternative schemes may include the use of pairs of 
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frequent cutters, followed by selective amplification (described in EP 0 534 858 Al), 
or the use of type US restriction enzymes. Type ns restriction enzymes are 
characterized by an asymmetric recognition sequence. Most of these enzymes cleave 
at a defined distance to one side of the recognition site and generate single stranded 
overhangs that have different sequences. Ligation of adaptor sequences that are 
complementary to only one type of overhang allows the amplification of specific 
subsets of fragments pKikuya Kato, Nucleic Acids Res. 23 : 3685-3690 (1995)]. With 
this strategy the set of ftagments obtained with the sampling enzymes can be broken 
lip in a defined number of complementaiy and roughly equally complex subsets. Thus, 
with these enzymes it is possible to tune the complexity of the sample. The same 
strategy can be applied by making use of type n enzymes that have an interrupted 
palindromic recognition sequence* 

Typeof mutations drtected by format-IRAA: In esseaice the method 
of the invOTtion aims to detect mutations affecting the recognition sequences of the site- 
specific probe endonuclease reagents. When the probe enzyme cleaves a sample 
ftagment, it is prevented from being amplified and as a consequence the fragment will 
not give a hybridization signal with its cognate probe. Mutations affecting the 
recognition sequence of the probe enzyme wiU allow amplification of the sample 
fragment and will restore the hybridization signal. It is recognized that mutations other 
than those affecting the probe enzyme recognition sites may affect the hybridization 
signals. In particular, mutations affecting the recognition sites of the sampling 
enzymes may also lead to a loss of hybridization signal. Consequently, the meie 
detection of a hybridization difference between two samples does not qualify the 
difference as being due to an ESP for the probing enzyme. For this one must also 
assay the two samples without probing enzyme cleavage; only those differences that are 
correlated with the cleavage by the probing enzyme qualify as genuine ESPs as defined 
according to the present invention. Therefore, a preferred embodiment of the methods 
of the present invention comprise the comparison of the hybridization signals obtained 
with and without cleavage of the same starting material by the probe endonuclease 
reagent. Preferably, the digested and undigested sample DNAs are differentially 
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labeled such that equivalent amounts of the material can be mixed and hybridized 
against the same array of probes. It is noted that a further advantage of measuring the 
relative hybridization signals obtained with digested and undigested sample DNAs, is 
that the signal given by the undigested sample DNA serves as an internal control for 
correcting variations in amplification and hybridization. 

Identification and design of informative probes to detect ESP- 
harboring fragments. In a preferred embodiment of the present invention sample 
DNAs (amplicons) are hybridized to micro-arrays comprising a set of probe DNAs 
which are designed such that each probe will hybridize specifically to one sample DNA 
fragment. For each set of sample DNA fragments a. specific set of probes are 
developed that will detect all the ESPs present in the set of sample DNAs. Since in 
most applications only a (minor) fraction of the sample DNAs will actually carry an 
ESP for a particular probing reagent, the set of probe DNAs will preferably consist of 
a subset of the sample DNA fragments that are informative in that they hybridize to 
ESP-haiboring sample fragments. Preferably, the probes are highly specific for the 
ESP-carrying sample fragments, and do not cross-hybridize with other fragments in the 
sample. This featore is verified by testing the candidate probes in control hybridization 
assays. When developing or designing the probes care should be taken to avoid 
hybridization of the labeled primer used to amplify the sample fragments. When the 
probes correspond to a subset of the sample fragments, preferably an alternative set of 
ad^tors should be used for their amplification. 

The sections below describe different approaches that may be used to 
assemble sets of unique probe DNAs for fabricating the micro-arrays. Three 
alternative approaches are presented, and their choice is determined primarily by the 
degree of nucleotide sequence variation, and hence the ESP frequency, present in the 
species under study. 

(1) Direct screening. When the ESP frequency is high, such that 10 % or more of the 
sample fragments cany ESPs, a reaUstic aj^roach for assembling ESP probes is 
to array individual sample fragments and test which of them detect an ESP in the 
test material under study. The advantage of this approach is that the same set of 
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fragments can be tested with different probe enzymes. After the screening one 
will retain only those probes that yield a clear-cut difference in hybridization 
between the different test DNAs, This approach is illustrated in Example 2. 

(2) Gel-based screening. With genomic DNA exhibiting intermediate ESP frequencies 

(a few %), useful probes can be identified with a gel-based screening approach 
in which the ESPs are identified by comparing the patterns of sample fragments 
obtained from cleaved and uncleaved genomic DNA of various individuals. The 
Ix>lymoiphic ftagments can ihsn be isolated from the gel and cloned or amplified. 
In a second phase, these probe-fragments are verified in a micro-array 
hybridization assay. This approach is illustrated in Example 1. 

(3) Batch-wise hybridization selection method. Since both approaches described above 

are inefficient and labor intensive when the ESP frequency is low {< 1 %), it is 
advantageous to directly select or enrich ESP-carrying fragments. Such an 
approach is described in greater detail in Example 3. 

The methods of the invention can be used with any type of micro-array: 
spotted ESP-carrying fragments, spotted oligonucleotides or oligonucleotides 
synthesized on solid supports using photolithography [Fodor S. P. A. et al , Science 
251: 767-773 (1991)]. Oligonucleotide probes can easily be designed based on the 
nucleotide sequences of the ESP-carrying fragments. Also, the methods of the 
invention are not Umited to the use of planar arrays containing spatially addressable 
probes. A person of sldll in the artwUl recognize that the methods may alos employ 
a multitude of identifiable solid phase particles (e.g. beads, spheres, and polyhedron), 
each carrying a different probe. Examples of such use are described by Fulton, R. 
[U.S. Patent No. 5,736,330] and Mandecki, W. [ U.S. Patent No. 5,736,332]. 

(m Format.n RA A CU >neral outline 

The 'format-I RAA' - as described above - can be converted to a 
*format-II assay' when sufficient sequence information of ESP-containing sample 
fiagments becomes known. Fonnat-n RAAs can also be designed on the basis of the 
known sequences of genomic regions that harbor an ESP and that are available through 
publicly accessible databases. The approach involves the targeted sampling of startmg 
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material and consists of the design of dedicated primer pairs that flank the ESP sites. 
Like in foimat-I RAA, if the site is intact, the starting DNA will be cleaved and no 
PGR product will be generated. Only when the site is mutated will the amplicon be 
graierated. In practice, multiple ESP-containing genomic regions are co-amplified after 
cleavage with the probing restriction endonuclease reagent. The ultimate sample DNA 
used in the hybridization reaction is composed of several such multiplex PGR reactions 
pooled together. The feasibility of this approach is evidenced by the recent paper of 
Wang et al. Science 280: 1077-1082 (1998), incoiporated herein by reference. The 
methods for fonnat-n RAA described here are identical to the approach described by 
Wang et al, in the way certain aUelic regions are co-amplified, but fundamentally 
different in the way they are diagnosed. The present method takes advantage of the 
clear distinction between having or not having an amplicon depending upon the alleUc 
state of the endonuclease target site. Hie Wang et al. approach in contrast reUes on the 
detection of a hybridization differaice as a result of a single nucleotide variation in the 
PGR produa. This requires a much more daborale and redundant hybridization assay. 

Similar to format-I RAA, a preferred method consists of comparing the 
hybridization signals obtained with and without cleavage with tiie probe restriction 
endonuclease reagent. Preferably, tiie respective amplification reactions are 
differentially labeled such that the resulting ampUcons can be mixed and hybridized 
against the same array of probes. 

Preferred metiiods of the format-H RAA are those wherein - of each 
PGR primer pair - tiiat primer that remained unlabeled is used as hybridization probe 
for tiie corresponding ampHcon. This ensures that the excess unincoiporated labeled 
primer as well as the primer extension products obtained with fliis primer cannot anneal 
to the arrayed probe. Also, the unlabeled PGR primer is complementary to the labeled 
strand of the amplicon. 

Furtiiermore, tiie format-H RAA metiiod provides a means to monitor 
mutations in specific genes or loci in addition to scanning die entire genome. Indeed, 
sets of PGR primers tiiat target ESPs in a specific gene or chromosome region can be 
assembled. 
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An RAA assay with positive def ection of both aTlelfts; It is 
recognized that the 'present/absent-score' of the RAA assay cannot (always) distinguish 
between different mutations that can affect cleavage by the probe restriction 
endonuclease reagent. In practice, an ESP should not be assayed when available 
evidence indicates the existence of two or more such mutations at significant 
ftequencies in the peculation. 

In a preferred embodiment the present invention is directed to the 
detection of SNPs that result in the sunultaneous loss and gain of a restriction enzyme 
recognition site, i.e. both alleles are associated with a different recognition site. Hgal 
(GACGQ and SfeNI (GATGC) are an example of such reciprocal sites. Use of both 
prc*ing aidonuclease reagents in side-by-side experiments excludes alternative alleles 
and results in easy determination of the zygosity (refer to Example 4). 

Multi-allelic haplotyping; . A single ESP rg>iesents a bi-aUelic marker, 
which is less informative than a variable micro-satellite, which has multiple alleles. 
It is possible however to compensate for the lower information content by identifying 
several ESPs on a specific chromosomal region. Format-II RAA lends itself readily 
to such an approach and involves the design of a primer pair that encompasses a region 
with a smgle site for the various selected probe endonuclease reagents. It should be 
recognized as one of the advantages of the present method that multiple ESPs on a 
sample ampUcon can be interrogated with a single probe. Furthermore, use of the 
probing enzymes, either sq>arately or in various combinations, in paraUel experiments 
aUows the construction of the haplotypes for the ESPs under study. In general, the 
statistical associations between traits and specific chromosome regions may be more 
apparent when haplotypes rather than individual maikers are used. 



OIDForm at-TnRAA- 

In a general sense, the format-IH RAA represents a method of choice for 
very high-density SNP genotyping because it provides a means to overcome the 
intrinsic limitations of bofli the format-I RAA and tiie format-H RAA. This is 
essentially achieved by performing a stepwise amplification involving a pre- 
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amplification of sample fragments followed by amplification using multiplexed specific 
primers. The principal advantage of the pre-amplification step is to reduce the 
complexity of the starting DNA, and thus to provide a more favorable starting point 
for performing multiplex PCR reactions. It is noted that this improvement is generally 
applicable to any multiplex PCR reaction, and is not limited to the methods of the 
present invention. Such an approach can also be used when for example SNPs are 
genotyped using the methods described by Wang et aL 

The principal limitation of the format-I RAA lies in the complexity of 
the sample DNA that is hybridized to the microarray. Because the second round of 
amplification in format-IH yields only very small amplicons, which are all informative, 
there is no longer a limitation in number of sample fragments that are interrogated. In 
fact the entire genome may be sampled in a series of parallel pre-amplification 
reactions and the amplicons generated in the different multiplex PCR reaction can then 
be pooled together and hybridized to the microarray. 

Likewise, the formal-Ill RAA represents preferred methods of f ormat-II 
RAA, especially when the ESPs under study are located on fragments generated by one 
set of sampling endonuclease reagents. Such stepwise amplification comprises the co- 
amplification of sample fragments with a single pair of primers, followed by the 
selective amplification of sets of specific ESP-containing regions (see Figure 5). The 
principal advantage of the format-in RAA over format-n RAA is that the initial 
amplification of the sampling fragments - representing only a fiaction of the total 
genome - lowers the amount of starting material required to interrogate a very large 
numbers of ESPs. Also, the approach wiU facilitate the multiplex amplification of the 
ESP-specific amplicons and, consequenfly, yield a more robust assay. 

One preferred embodiment of the format-Hi RAA is its use to genotype 
large numbers of ESPs identified through the use of the format-I RAA. Indeed, 
format-I RAA offers a wpid means to discover large numbers of ESPs in any biological 
species where no large body of secpience information is or will be available. Format-I 
RAA enables one to discover many sets of ESPs for a number of different probing 
enzymes. Using the fonnat-I RAA, each set of ESPs must be assayed on a different 
microarray, because otherwise signals for the same sample fragment will overlap with 



wo 00/28081 PCTAB99/01 958 

•21 - 

one another, and thus preclude the proper ESP genotype to be determined. Using the 
fonnat-in RAA, the ESPs identified with different probing enzymes are now assayed 
together on one single microarray, without overlap between the different ESPs. The 
reason is tihat the overlap in the fonnat-I RAA is caused by the non-informative sample 
fragments that are always co-amplified with the ESP fragments. These are eliminated 
from the mixture by the specific PCR amplification. This embodiment is illustrated in 
Examples 2 and 3. 

Another preferred embodiment of the format-lU RAA is its use to 
genotype large numbers of SNPs identified in high-throughput sequencing of genomic 
DNA fipom diffCTent individuals from a given species. Given the generally lecognized 
importance of SNPs for the development of high-resolution genotyping methods, 
sequCTced SNPs can be expected to accumulate in large numbere in publicly available 
databases in the near future. In particular, in the field of human genetic analysis, SNPs 
will be discovered at a r^idly increasing rate through the massive genome sequencing 
programs now in progress. A similar evolution may be anticipated for many other 
species. Hence we decided to perform an in silico analysis of known human SNPs to 
further investigate the potential of the invention. More particularly we have analyzed 
the 3,358 SNP sequences present in the SNP database of the Whitehead Institute [Wang 
et al.. Science 280: 1077-1082 (1998)]. We have determined how many of these SNPs 
represent an ESP for each of 34 known palindromic and non-palindromic tetra- and 
penta-nucleotide restriction recognition sequences. When extrapolating this number to 
the total number of ESPs in the human genome - assuming a grand total of 3 million 
ESPs - it apj)ears that the number of detectable ESPs per probing restriction enzyme 
is in the range of 25.000 to 150.000. A cumulative analysis reveals that 53% of the 
SNPs^ affect at least one of the 34 restriction sites; a total of 28 % affect the lecognition 
site for one of the available tetracutter enzymes. The principal conclusion from this 
analysis is that many of the considered enzymes - used as probing enzymes according 
to the m^ods of the present invention - will interrogate sufficient SNPs to be able to 
built a high-density map of the human genome. It should also be noted that the use of 
multiple probing enzymes is easily accommodated in the targeted assay because the 
sample has to be subdivided anyway over a number of parallel multiplex PCR 
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reactions. This embodiment is illustrated in Example 4. 

It is noted that the fonnat-in RAA may be i>erformed according to 
different procedures. One such procedure is diagrammed in Figure 5, in which the 
test DNA is first sampled using a sampling endonuclease reagent, pre-amplified and 
then treated with the probing endonuclease reagent. Variations on this procedure are 
readily recognized by those skilled in the ait and include for example, concomitant 
treatment of the test DNA with both the sampling and the probing endonuclease 
reagents and the preparation of sampled DNA ftagments using arbitrary PGR priming 
methods [Williams et al, NucldcAdds Res, 18: 6531-6535 (1990)], Note that in case 
the treatment with the probing endonuclease reagent is perfonned prior to the pre- 
amplification, the subsequent amplification can be performed with any pair of PGR 
primers directed against the ESP carrying fragments, and thus overcoming the 
limitation of using PGR primers flanking the ESPs. 
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Table I. Analysis of 3,358 SNPs in the Whitehead SNP database. The table lists the 
number of SNPs that represent an ESP for various probing enzymes. The last column 
shows the estimated number of ESPs for each enzyme in the entire human genome 
(refer to text for details). 
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The following illustrative examples were chosen to represent the 
spectrum of genomic complexities and the spectrum of degrees of genetic variation 
which are suscq>tible to analysis using the methods of the present invention: 

Example 1 describes analysis of Arabidopsis (low genomic complexity, 
low genetic variation). 

Example 2 describes graetic analysis of com (high genomic complexity, 
high genetic variation). 

Examples 3 and 4 describe genetic analysis in humans (high genomic 
complexity, low genetic variation). 

Numbers given in the examples, and that relate to the occurrence 
frequency of certain restriction sites as well as the average size of the generated 
fragments are in part based on computer simulations using publicly available DNA 
sequences. 
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Example 1 
G^nrtic Analysis in Arabidopsis 

In this example, a fragment analysis-based approach is used to generate 
a set of genomic fragments canying ESPs between the Aiabidopsis ecotypes Landsberg 
and Columbia, which are commonly used for genetic studies in the model organism. 
Aiabidopsis is an example of a low complexity genome (size '120 Mb), and the two 
ecotypes exhibit a moderate level of genetic variability. Previous studies have revealed 
that the average nucleotide sequence variation between the two ecotypes is in the order 
1 polymorphism in 150 nucleotides. Consequently, the ftaction of fragments expected 
to carry an ESP for tetranucleotide recognizing restriction enzymes is expected to be 
in the range of 2.5 % (1:40). With such a low frequency, it is helpful to use a selection 
procedure to isolate the rare fragments containing ESPs. 

In essence the procedure described in this example comprises the 
following steps: 

4) Identification of a set of about 200 genomic fragments carrying 
Landsberg/Columbia ESPs using a gel-electrophoretic approach. 

5) Isolation and characterization of the ESP carrying DNA 
fragments (ESP fragments). 

6) Generation of micro-arrays with the ESP fragments 

7) Confirmation of the ESPs by hybridization. 

Step 1. Identification of RSP fragmftnts 
Sampling enzymes. In the present example EcoRI, a restriction enzyme 
recognizing 6 nucleotides (hexacutter), in combination with Bfal, a restriction enzyme 
recognizing 4 nucleotides (tetracutter), are chosen as sampling enzymes. From the 
random frequency of occurrence of 6 nucleotide sequences (every 4,000 bases), the 
number of sites for hexacutter restriction enzymes in this genome is predicted to be in 
the range of 30,000. In addition to cleavage with a hexacutter, the genomic DNA is 
also cut with a tetracutter so as to generate PGR amplifiable fragments of an average 
size of a few hundred base pairs. Cleavage with the two enzymes gives rise to two 
types of fragments: a majority of fragments resulting from cleavage by the tetracutter 
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enzyme alone and a smaller set of fragments produced by the two enzymes (see Figure 
5). Since the majority of the hexacutter fragments will give rise to two fragments 
having a hexacutter end and a tetracutter end {see Figure 5), this procedure will yield 
a mixture of about 60,000 fragments of this type. Upon amplification using the 
procedure described below only the fragments carrying a tetracutter end and a 
hexacutter end are amplified efficiently (Figure 5). 

Probing enzymes. As probing enzymes many different tetracutter 
enzymes can be used. Ideally, the probing enzyme cleaves most of the sample 
fragments once. Because plant DNA has a high AT content, the preferred tetracutters 
are those that have an AT bias in their recognition sequence. In general, the choice of 
an optimal tetracutter may be determined by particular features of the genome being 
analyzed {e.g. , AT and GC content). In the present example, Msel (recognition site = 
TTAA) was chosen. Tsp509I (recognition site = AATT) is an alternative. It is also 
conceivable to use mixtures of two or more tetracutter enzymes. The EcoRI-Bfal 
sample/target fragments that axe cleaved and not cleaved with the Msel probing enzyme 
are referred to as cleaved and uncleaved sample/target DNA, respectively. 

Screening for ESP carrymg fragments. To detect ESP ftagments, subsets 
of uncleaved and cleaved EcoRI-Bfal sample fragments from both ecotypes are 
amplified and the an^licons are compared following gel-electrophoretic fractionation. 
Subsets of the EcoRI-Bfal sample fh^ments are selectively amplified as described 
[Vos, P. et al. Nucleic Acids Res. 23: 4407-4414 (1995); Zabeau, M, and Vos, P., 
European Patent Application EP 0534858 (1993) both of which are incoiporated herein 
by reference]. Given the complexity of the sample ("50,000 fragments), the selective 
amplifications are performed with EcoRI and Bfal primers having two and three 
selective nucleotides, respectively. This equals 1024 (16 x 64) different selective 
amplification reactions. 

The experimental procedure described by Vos P. et aL is followed 
except that the template fragments are incubated at 65**C during 10 minutes to heat- 
inactivate the T4 ligase enzyme, and, when applicable, digested with the probing 
enzyme prior to amplification. The structures of the EcoRI and Bfal adaptors are as 
follows [see, e.g., Vos, P. etal, supra]: 
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5 I -CTCGTAGACTGCGTACC (SEQ ID NO: 1) 

CATCTGACGCATGGTTAA"5^ (SEQ ID NO: 2) 

5 ' -GACGATGAGTCCTGAG (SEQ ID NO: 3) 

TACTCAGGACTCAT- 5 ' (SEQ ID NO: 4) 

The EcoRI (radiolabeled by 5 '-phosphorylation) and Bfal primers, 
having two and three selective nucleotides, respectively, have the following sequences 
(where N represents A, C, G, or T): 

5' -GACTGC6TACCAATTCNN (SEQ ID NO: 5) 
5 ' -GATGAGTCCTGAGTAGNNN (SEQ ID NO: 6) 

Using these reagmts, most of the obtainable target fragments contain a 
cleavage site for the probing CTzyme and, consequently, will not be amplified when the 
target DNA is cleaved. Most of the fragments that survive the treatment with the 
probing enzyme occur in both ecotypes, and thus carry no ESP. Occasionally ftagments 
are found that appear in both ecotypes when the target DNA is not digested and that 
are present in only one of the two ecotypes after digestion. These represent trae ESPs 
for the probing enzyme. In addition, fragments wiU also be found that show typical 
AELP-polymoiphism between the two ecotypes [Vos, P. et ah , Nucleic Acids Res. 23: 
4407-4414 (1995)]. Such polymorphisms are apparent in the fragment patterns 
obtainable with the undigested sample DNAs. A typical result is shown in Figure 6 in 
which the electrophoretic patterns are shown of selectively amplified EcoRI-Bfal 
fragments from the Ecotypes Columbia and Landsberg obtained without and with 
digestion with the Msel probing enzyme. 

Systematic comparison of the patterns of ecotypes Columbia and 
Landsberg before and after digestion, allows the identification of EcoRI-Bfal sample 
amplicons that carry an ESP for the probing enzyme. Using Msel as probing enzyme, 
it is estimated that a total of "200 polymorphic fragments which are present in only one 
of the ecotypes can be identified. 
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Step 2. Isolation and chaTacterizatinn of V Sil> frapm^r^tQ 
Each of the ESP polymoiphic fragments is eluted from the gel-matrix, 
re-amplified and cloned into a suitable plasmid vector (e.g. TA cloning system; 
Invitrogen, Carlsbad, CA, U.S.A.), In each case, two clones are selected for sequence 
determination. Most duplicate clones wiU yield the same sequence. Duplicate clones 
that gave different sequences were not retained for further work. Since the nucleotide 
sequence of over one third of the Arabidopsis genome is available in the pubUc 
databases ie.g. , Genbank), the chromosomal locatLcw of one third of the ESP fragments 
can be determined by matching the fragment sequences to the genomic sequence. 
Furthermore since the ^omic sequence is daived from ecotype Columbia, we expect 
a perfect match with the fragment sequences isolated from the same ecotype. The 
sequences of the fiagmraits isolated from ecotype Landsberg will reveal single 
nucleotide differences, amongst which the potential restriction site mutations, affecting 
the Msel recognition sites, should be apparent. 

In addition to the ESP polymorphic fragments, a number of non- 
polymorphic control fragments are processed in the same way. Two types of such 
control monomotphic fiagments are isolated: fragments that do not carry a site for the 
probing enzyme and fragments that cany a site for the probing enzyme in both 
ecotypes. Hiese fiagments will serve the purpose of verifying the hybridization on the 
micro-arrays. 



Step 3. Fahricarirm nf P„«^p minm-aTniyis , 
Micro-arrays of ampUfied fragments. The insert DNAs from the 
sequence verified clones are amplified, e.g. with the use of non-selective EcoRI and 
Bfel primers. PCR products are verified by agarose gel electrophoresis and retained if 
a single product of the correct mobiHty was present. Following ethanol precipitation, 
the resu^nded PCR products are arrayed at high density on standard glass slides (25 
X 76 mm) using either the Multigrid robotic spotter (GeneMachines^", Genomic 
Instrumentation Services Inc., Menlo Park, CA, U.S.A.) or the BioChip Anayer™ 
(Packard Instrument Company, Meriden, CT, U.S.A.). The DNAs are spotted in a 
logical orda- with respect to the ecotype from which the fiagments were isolated (upper 
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and lower panel) as shown in Figuie 7. In addition, a set of DNAs from monomoiphic 
control fragments was spotted next to the ESP fragment DNAs (right panel in Figure 
7). 



Micro-arrays of oligonucleotides. Based on the nucleotide sequences of 
the ESP fragments, oligonucleotides can be designed that can serve as hybridization 
probes to specifically detect each amplified sample fragment. The oligonucleotide probe 
should preferably match with a sequence that is located to one side of the ESP, 
opposite the side where the sequence targeted by the labeled primer is located. In this 
way the background is minimized because the linear amplification products generated 
by the labeled primer following digestion with the probing enzyme are not detected. 
The ESP fragment specific oligonucleotides are spotted in a micro-array format in 
exactly the same way as the amplified ESP fragments. 



Step 4. Micn>aTray-ha5;ftd detention nf KSp^ 
Preparation of the sample DNAs, For each ecotype, sample DNA is 
prepared in two differmt ways. Genomic DNA, digested with the sampling restriction 
enzymes EcoRI and Bfal, was amplified either as such or after cleavage with the 
probing ^izyme Msel. The amplification reactions are performed with a fluorescently 
labeled EcoRI primer and an unlabeled Bfal primer, both without selective nucleotides. 
The EcoRI primer is labeled by incorporation of Cy3(green)- and Cy5(red)-amidites 
during primer synthesis (Amersham Pharmacia Biotech, Uppsala, Sweden). For both 
Columbia and Landsberg, the cleaved sample was amplified with a Cy3-primer while 
the uncleaved fragments were amplified with a Cy5-labeled EcoRI primer. In addition, 
the Landsberg digested material was also amplified with a Cy5-labeled EcoRI PGR 
primer. Three different hybridization solutions are then prepared by mixing equal 
amounts (i.e. equal volumes) of the Cy3- and Cy5-labeled amplification reactions: one 
from the Columbia cleaved and uncleaved samples, a second from the Landsberg 
cleaved and uncleaved samples, and a third by mixing the differentially labeled cleaved 
samples of both ecotypes. 

In case arrays of PCR products, rather than oligonucleotides, are used 
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as probes (refer to step 3), the co-amplification of the EcoRI-Bfal sample fragments is 
preferably accomplished with a pair of adaptors that differs from those attached to the 
arrayed probes. The alternative EcoRI and Bfal adaptors have the following structure: 

5' -GAGCATCTGACGCATCC (SEQ ID NO: 26) 

GTAGACTGCGTAGGTTAA-5' (SEQ ID NO: 27) 

5* -CTGCTACTCAGGACTG (SEQ ID NO: 13) 

ATGAGTCCTGACAT-5' (SEQ ID NO: 14) 

The cognate non-selective EcoRI and Bfal primers have the following 

sequences: 

5' -CTGACGCATCCAATTC (SEQ ID NO: 28) 
5' -CTACTCIASGACTGTAG (SEQ ID NO: 16) 

Micro-^irray hybridization. Each of the hybridization solutions is allowed 
to hybridize to the arrayed probes using protocols well known in the art. The 
experimental conditions depend primarily on the nature of the probes, PCR-ampIified 
fragments versus oligonucleotides. Both types of experiments are amply described in 
literature: Wodicka, L. etal,. Nature Biotechnoh 15: 1359-1367 (1997); Lockhart, D. 
J. etal.. Nature Biotechnol 14: 1675-1680 (1996); DeRisi, J. L. etal. Science 21%: 
680-686 (1997); Shalon, D, etal. Genome Res. 6: 639-645 (1996); Pietu, G. etcd.. 
Genome Res. 6: 492-503 (1196); Chee, M. etal. Science 274: 610-614 (1996); Wang 
D.G. etal.. Science 280: 1077-1082 (1998); WinzderE. A. etal. Science 281: 1194- 
1197 (1998), all of which are incorporated herein by reference. 

A laser scanning system (ScanAnay 3000; General Scanning Inc., 
Watertown, MA, U.S.A.) is used to detect the two-color fluorescence hybridization 
signals from the micro-arrays at a resolution of 10 micron per pixel. A separate scan 
is carried out for each of the two fluorophores used. Scaiming parameters and laser 
power settings are adjusted to normalize the signal in the two chaimels (channel-l/Cy3; 
chaimel-2/Cy5). Hie obtained digital images were analyzed using the ImaGene'^'^ image 
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analysis software (BioDiscovery Inc, Los Angeles, CA, U.S.A.). The extracted 
quantitative data are transferred to a spreadsheet for further analysis. 

The present hybridization e^qperiment is essentially set up as a 
confirmation of the gel-electrophoretic data (refer to step 1), and has, therefore, a 
5 predictable outcome. In addition, a number of control probes are included on the 
biochip that detect monomorphic EcoRI-Bfal Arabidopsis fragments (i,e., fragments 
on which a site for the probing enzyme is either present or absent in both ecotypes). 
The results from these control probes allow correction for background and optical 
cross-talk between the two channels, as well as calibration of the red and green 
10 hybridization signals. It is anticipated that the vast majority of the processed data are 
unambiguous with respect to the allelic state of a sample fragment and in agreement 
with the gel-electrophoretic analysis. Figure 7 shows a false-color representation of the 
idealized results of the present experiment using a fictitious array of probes. It cannot 
be excluded that certain hybridization results are not in agreement with the gel- 
electrophoretic assay and/or that certain probes do not allow unambiguous 
deteraiination of the allelic state of the cognate sample fragment. Such probes should 
be excluded from the micro-arrays that are used to genotype experimental Arabidopsis 
samples, other than the Columbia and Landsberg controls used in the present 
illustrative example. 

20 In routine genotyping experiments, either one of the hybridization 

schemes outlined above can be used. Determination of the allelic state can be done by 
comparing the hybridization signals obtained with and without cleavage of the starting 
DNA with the probe reagent. Alternatively, allele-calling could be based on a 
comparison of the signals obtained with the test-sample and an appropriate control (e.g. 

25 Columbia or Landsberg DNA), both cleaved with the probe endonuclease reagent. The 

samples that need to be compared can, in principle, be hybridized separately but a 
preferred mefliod consists of hybridizing a mixture of differentially labeled samples to 
the same array. 
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Example 2 
Genetic. Analysis in Cnm 

In this example, the utility of the method of the invention for maricer 
assisted selection plications in plant and animal breeding is illustrated. Com has been 
chosen because it is a typical rqjiesentative of ciop species having a complex genome. 
The large size of the genome (2,400 Mb), the frequent occurrence of repetitive DNA 
sequences and the high degiee of genetic variation, all constitute technical challenges. 
In this example, an aj^roach based on the generation of a set of genomic fragments 
canying ESPs from two weU-known inbred lines of com, B73 and Mol7 from which 
many of the com elite lines are derived is used. Another reason for choosing these 
lines is that a weU-studied recombinant inbred population derived from these lines is 
available. This population can be used to map the set of ESPs. The genetic m^ of ESP 
markers wiU prove to be an effective tool for genetic selection in com breeding. It is 
evidait, however, that a broader survey of the com germplasm with a total of 10 to 20 
lines will give a large number of additional ESPs (possibly 2 or 3 times as many) and 
will eventually result in a higher-resolution genetic map. 

The ESP-haiboring fragments could very well be identified by die gel- 
electrophoretic approach described for Arabidopsis (Example 1). However, an 
alternative strategy may be used given that tiie com geimplasm, like many crop 
species, exhibits a high degree of genetic variation. Indeed, based on previous studies, 
the average nucleotide sequence variation in tiie com germplasm is estimated to be in 
the order of 1 difference in 15 to 30 nucleotides. This corresponds to a frequency in 
ESPs in tiie recognition sites of tetracutter restriction enzymes of 1 in 4. At this 
frequency it becomes feasible to direcfly examine arrays of random B73/Mol7- 
fragments for tiie presence of ESPs using tiie present RAA metiiod witiiout prior 
screening or selection. The strategy also lends itself readfly to screening with several 
different probing enzymes. 

In the present example, two different approaches for assaying ESPs are 
used. The first metiiod (format-I RAA) is similar to tiie one described in Example 1, 
and detects ESPs in fragments sampled witii a pair of restriction enzymes. In tiie 
second metiiod (format-m RAA) individual ESPs are selectively amplified from the 
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sampled fragments with dedicated primer sets. The principal advantage of the latter 
approach is that ESPs detected with several different probing enzymes can be assayed 
simultaneously, and that mult5)lex amplification of ESP-specific PGR prxxiucts is made 
considerably more robust. 

In essence the procedure described in this example comprises the 
following steps: 

8. Identification of a set of candidate ESP fragments from the 
inbred lines B73 and Mol7 

9. Development of a com ESP micro-array 

10. Gens&c m^ing of a B73/Mol7 recombinant inbred population 
and of segregating populations 

Step 1. Identificatinn nf ranHi date R9P frapmf^tit f^ 
Cloning of a set of sample fragments. To clone a set of random 
firagments fiom the inbred lines B73 and Mol7, the en2yme combination PstI and Bfal 
is used. Tie hexanucleotide-recognizing enzyme PstI was chosen because of the large 
size of the com genome. It is estimated that this enzyme has around 30,000 sites in the 
com genome. The second tetracutter-enzyme, Bfal, is expected to cleave in the 
majority of the cases on both sides of the PstI sites. The double digestion will therefore 
generate about 60,000 sample fragments with an average size of 400-500 base pairs. 

Following double digestion of the genomic DNA, PstI- and Bfal- 
adaptors were ligated to the fragment ends and the material amplified with non- 
selective PstI and Bfel primers. The stractures of the PstI- and Bfal-adaptors are based 
on those described by Vos P. et al. Nucleic Acids Res, 23: 4407-4414 (1995): 

5' -CTCGTAGACTGCGTACATGCA (SEQ ID NO; 7) 
3 ' - CATCTGACGCATGT (SEQ ID NO: 8) 

5' -GACGATGAGTCCTGAG (SEQ ID NO: 3) 

3' -TACTCAGGACTCAT (SEQ ID NO: 4) 



The corresponding PstI and Bfal non-selective primers have the following sequences: 
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5 » -GACTGCGTACATGCAG (SEQ ID NO: 9) 
5' -GATGAGTCCTGAGTAG (SEQ ID NO: 10) 

The amplification step enriches the Pstl-Bfal fragments over the large 
excess of Bfal-Bfal fragments. After amplification the fragments are fractionated on 
an agarose gel to eliminate the fragments smaller than 100 base pair, and cloned in an 
appropriate vector (e.g. TA cloning system; Invitrogen, Carlsbad, CA, U.S.A.). 

Preparation of spotted micro-arrays with the cloned sample DNA 
fragmems. The insert DNAs, from the two libraries of cloned Pstl-Bfal sample 
fragments (obtained from the B73 and Mol7 inbred lines), are amplified from the 
clones using the non-selective Pstt and Bfal primers. Following purification and 
concentration, the amplicons are arrayed as described in Example L A total of 20,000 
(i.e. 10,000 fiiom each library) candidate probe DNAs are spotted. 

Micro-array hybridization and selection of candidate ESP-fragmems, 
From genomic DNA of the inbred lines B73 and Mol7 four different sets of Pstl/Bfal- 
digested amplified DNA are prepared. An alternative pair of adaptors and non-selective 
amplification primers are used for this: 

5 • -GAGCATCTGACGCATGTTGCA (SEQ ID NO: 11) 
3' -GTAGACTGCGTACA (SEQ ID NO: 12) 

5 ' - CTGCTACTCAGGACTG (SEQ ID NO: 13) 

3 ' -ATGAGTCCTCACAT (SEQ ID NO: 14) 

5 ' - CTGACGCAT6TTGCAG (SEQ ID NO: 15) 

5« -CTACTCAGGACTGTAG (SEQ ID NO: 16) 

The sample fragments are amplified either as such or after digestion with 
one of tiiree alternative probing enzymes, Msel, Tsp509I and Alul. As probing 
enzymes many different temcutter or pentacutter enzymes can be used. Because plant 
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DNA has a high AT content, the preferred en2ymes are those that have an AT bias in 
their recognition sequence. Alternatively, mixtures of two or more tetracutter or 
pentacutter enzymes can be used. 

For each of the B73 samples, a Cy3(green)-labeled PstI primer is used, 
whereas the Mol7-derived fragments are amplified with a Cy5(red)-labeled PstI primer 
(refer to Example 1). Different hybridization solutions are then prepared by mixing 
equal amounts of the uncleaved, Msel-cleaved, Tsp509I-cleaved, and Alul-cleaved 
samples of both inbred lines. Each of the 4 mixes is allowed to hybridize to the micro- 
arrays. Analysis of the scanned images involved normalization using the multitude of 
probes on the arrays that detect monomoiphic fragments. Figure 8 shows a false-color 
r^resaitation of the ideali^ results of the present e^riment using a fictitious array 
of probes. 

Analysis reveals that candidate ESP fragments are readily identified by 
scoring the probes that hybridize with only one of the two inbred line sample DNAs 
after cleavage with the probe enzyme (Figure 8). Tlie quantitative analysis allows us 
the use of an unambiguous cut-off threshold of 10-fold difference in the normalized 
signal intensities for scoring ESPs. It should be pointed out that the assay identifies 
both bona fide ESPs and polymorphisms in the sampling enzyme sites. Most of the 
latter polymorphisms result in a marked hybridization difference with the sample DNAs 
not cleaved with the probe enzyme (see Figure 8). Analysis of 180 probes reveals that 
roughly 6% of the sample fragments carry ESPs for Msel, Tsp509I, or Alul, in 
accordance with the raqjected ESP mutation frequency. The analysis of 20,000 cloned 
probe ftagmaits is tims ©spected to yield a total of 1 ,200 ftagments carrying ESPs for 
the three probe enzymes tested. By using additional tetracutter and pentacutter 
enzymes (see Table I), the ftaction of ESP carrying fragments may be as high as 25%, 
amounting to 5,000 ESPs. 

Of all probes that exhibit a differential hybridization with the cleaved 
sample DNAs, only those in which the recognition site for the probing enzyme is 
present were retained for development of a com micro-array. Sequence determination 
of these probe-fragments reveals the position of the recognition site for the probe 
enzyme. Thus, we retained only those probes that failed to give a signal with the 
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cleaved sample DNA from the same inbred line from which they were isolated. Such 
probes exhibit the hybridization pattern shown in the Table here below and are marked 
with an arrow in Figure 8. 

B73/Mol7 (Cy3/Cy5) normalized hybridization signal 
Undigested MseI/Tsp509l/AluI-digested 
B73-probes "1 < 0.1 

Mol7-probes "1 > 10 



Step 2. Development of a com ESP micro-array 
Sequencing of the candidate ESPs and design of marker specific primers. 
Clones corresponding to the prctoes that yield the desired hybridization pattern (Figure 
8) are sequenced. The majority of the insert DNAs derived from these clones contain 
a single recognition site for the probing enzyme. For each unique candidate ESP, two 
specific PGR primers, flanking the restriction site, are designed. 

In addition, the sequrace of a limited set of probes that yielded invariant 
hybridization signals is also determined. PGR primers targeting these monomorphic 
sequences are included as references; they are used to calibrate the hybridiz^on 
signals. 

VaUdation of the candidate ESPs and fabrication of com micro-arrays. 
The candidate ESPs, identified under step 1, are subjected to a confirmatory 
experiment using the format-m approach. First, four pre-amplification reactions are 
performed with a single primer pair and using the Pstl-Bfal fragments, undigested or 
digested with either one of the three probing enzymes, as template material. These 
amplification reactions reduce the complexity of the DNA under study by more than 
two orders of magnitude while at the same time generating a large enough amount of 
material for the subsequent multiplex marker-specific PCRs. The pre-amplifications 
are then used for the PGR rescue of each of the characterized candidate ESPs using 
dedicated primer couples [refer to Wang, D. G.etal., Science 280:1077-1082 (1998)]. 
Particular sets of the ESP-specific primers that amplify the same type of ESP (i.e. 
ESPs for one particular probing enzyme) are combined in a siagle reaction, together 
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with the appropriate pre-amplification material as template. One of the ESP-specific 
primers is either Cy3- or Cy5-labeled; the other remained unlabeled. The Cy3-primers 
are used for the multiplex amplification of the DNA that had previously been digested 
with a probing enzyme, whereas the Cy5-primers are used with undigested control 
DNA. The PGR products from the various multiplex reactions performed on both 
digested and undigested DNA were pooled together to obtain a single hybridization 
mixture per starting DNA. The B73 and Mo 17 derived material was analyzed in 
parallel experiments. The set of ESP-specific unlabeled PGR primers served as 
hybridization probes and was arrayed in the same way as amplification products. 
Gonditions used are similar to those previously described for hybridization against 
oligonucleotide probes and are readily determined by one of ordinary skill in the art. 

Direct comparison of the normalized Gy3 and Cy5 hybridization signals 
allows determination of the allelic state of the endonuclease target site in B73 versus 
Mol7. Primer pairs that do not allow unambiguous allele calling or that do not 
confirm the candidate ESPs idmtified with Psfl-Bfal sampling (refer to step 1), ar^ not 
retained for further work. 

Step 3. Genetic analysis of a B73/Mo17 recombinant inbred population and nf 

Segregating populations 
Genetic analysis of a B73/Mol7 inbred population. A collection of 
recombinant inbred lines derived from a cross between B73 and Mol7 is publicly 
available and provides a most useful set of lines for verifying and mapping the 
collection of ESP markers. The advantage of recombinant inbred lines over segregating 
populations is that each inbred line contains a different set of homozygous chromosome 
segments derived from either parent line. Gonsequently each ESP will be scored as 
either present or absent. Prq)aration of the sample DNAs and hybridization against the 
arrayed probes are performed as described under step 2. The experiment will, in the 
first place, allow the testing of selected ESPs in over 100 measurements; the results 
will result in the development of a second generation system that will only detect the 
most consistent ESPs. In addition, the linkage analysis of the segregation data wiU 
allow the construction of a fine genetic map of the markers. Finally, based on the 
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mapping data, an ordered ESP micro-array is developed for com. 

Genetic analysis of segregating populations. While isolated from two 
inbred lines, it is anticipated that the above-mentioned ordered ESP micro-arrays will 
detect sufficient genetic polymorphism in other com lines to be useful for marker 
assisted selection. To demonstrate the applicability, one could either chose a 
segregating F2 population or a back-cross population. Sample prq)arations and 
hybridizations are again performed as described under step 2. In this experiment, the 
ESP markers must be scored quantitatively so as to differentiate between heterozygosity 
and homozygosity. Because only the most consistent markers are retained, a two-fold 
difference in signal intensity is easily monitored. The approach used consists of 
normalizing the hybridization signal intensities and then applying a mixture model 
analysis on the normalized data. This statistical approach consists of determining 
whether the relative signal intensities can be grouped into three discrete classes, 
corresponding to respectively homozygous present, heterozygous and homozygous 
absent. ESP markers that do not fiilfiH this criterion should be eliminated from the 
analysis. 



Examples 

Human Genetic Analysis TTrfnff fh* . Format-T Ri^/y 
This example illustrates the application of the method of the invention 
for genome-wide genetic analysis in humans. Human is an example of a high 
complexity genome (size "3,000 Mb) combined with a very low level of genetic 
variability. Single nucleotide differences between pairs of aflelic sequences ftom 
different individuals occur proximately once in every 1000 basq)airs; in the 
population at large, the frequency may be in the order of 1:300. As with Arabidopsis, 
such a low frequency necessitates the use of a selection procedure for the 
isolation/enrichment of the rare ESP-haiboring fitagments. In this example a batch-wise 
hybridization is used to accomplish this. 

Based on the known mutation frequencies, it can be estimated that the 
ESP fiequency for a t«racutter-probing enzyme is in the order of 1 in 125 recognition 
sites. This low level of genetic variation, in combination with the sensitivity of micro- 
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airay hybridization, limits the number of ESPs that can be detected in a single assay 
(typically ranging from a few hundred to one thousand, a few thousand at the most). 
These limitations can, to a certain extent, be overcome by choosing probing enzymes 
that recognize tetranucleotide sites containing a CpG dinucleotide. Indeed, it is well 
documented that a substantial fraction (> 25 %) of the nucleotide substitutions in the 
human genome result from C T transitions in CpG dinucleotides. Such CpG 
dinucleotides represent mutational hotspots in vertebrates because a large fraction of 
the cytosines are methylated and subsequently mutate to thymine by deamination. It is 
estimated that the mutation frequency of methylated cytosines is 6 to 8-fold higher than 
average. Hence probing enzymes that cleave CpG-containing recognition sites will 
yield ESPs at correspondingly higher frequencies, estimated at "5%* However, the 
adverse consequence of the high mutation rate is that CpG is relatively rare in 
manmiaiian DNA, occurring with a frequency of 1 in 100 nucleotides [Wang, D. G. 
et al. Science 280:1077-1082 (1998)] instead of 1 in 16. Likewise the frequency of 
CpG-containing tetranucleotide sites is 1 in '1600 instead of 1 in 256 bases. To 
compensate for this, a probe endonuclease reagent can be used, comprising of two or 
more of the following complementary restriction enzymes: TaqI (TCGA), Mspl 
(CCGG), Maell (ACGT), and HinPI or Hhal (GCGC). It should be noted however that 
cleavage by Maell as well as the isoscWzomers HinPI and Hhal is blocked by 
methylation of the cytosine residue (C^) within the CpG dinucleotide. These enzymes 
will thus only cleave at a fraction of their sites, namely the non-methylated sites. 
Analysis of the large amount of publicly accessible human genomic DNA sequence 
shows that the cocldail of the 4 enzymes wiU cleave once in every 400 bp on average. 
The total number of sites in tiie genome is thus in the order of 7.5 million. Assuming 
that tiie ESP frequency is 5%, the enzyme cocktail has the potential of detecting 
'375,000 ESPs. In addition to using combinations of restriction endonucleases, one 
may also use reaction conditions that decrease the cleavage specificity. Such a strategy 
has been applied to obtain a restriction endonuclease reagent, designated CGasel, that 
is capable of cleaving DNA at CpG dinucleotides [Mead D, et al, WO 94/21663]. 
Tliis CGasel restriction endonuclease reagent may be particularly useful for the analysis 
of human polymorphisms using the methods of the present invention. 
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The example described below illustrates the approach in a limited scale 
assay, which characterizes the human ESPs within CpG-containing tetranucleotide 
recognition sites using the sampling enzyme combination Pad - Bfal. The rare cutter 
Pad is estimated to have only about 50,000 cleavage sites in the human genome; the 
5 ftequent cutter Bfial will graerate two fragments per Pad site. The enzyme combination 
will, therefore, create a moderately complex set of 100,000 Pad-Blal target fragments. 
This fragment set captures a sizable number of CpG-containing restriction sites, 
estimated in the order of 40,000. Assuming a 5% ESP frequency, the number of 
detectable ESPs is in the order of 2000. It should be stressed that many different 
.40 sampling enzyme combinations can be used and that thus a substantial fraction of the 

"375,000 ESPs located within NCGN-type restriction sites can be monitored. 
hi The procedure outlined in this sample comprises the following stq>s: 

m (1) Devdqpment of a set of candidate Pad-Bfal ESP ftagments 

;1 (2) Genetic analysis of humans using ESP probe fragments 

-15 

m Step 1 . PevelQpmCTt of a set of Facl-Bfal probe fragments 

^1 A mixtaie of sample fragments, derived from various individuals in the 

p population, can be divided in three classes with respect to sites for the probing enzyme: 

monomorpilic fragments that are devoid of a cleavage site, fragments that are always 
20 cleaved, and fitagments that carry one polymorphic recognition site. Fragments that are 
digested will be referred to as S+ fragments and fragments lacking the site as S- 
fragmCTts. Polymorphic ESP fragments will thus be the only fragments present in both 
the S+ and S- population of sampling fragments. This forms the basis for their 
selection by batch-wise hybridization: only ESP fragments are C25>able of axmealing 
25 when mixing the S + and S- fragment collections. The hybridization-selection can be 
performed in two different, recq>rDcal ways: either the S+ fragments can be used to 
retrieve the matching S- fragments, or S- fragments are used to collect the 
complem^itaiy S+ sampling fragments. In one approach, the selected candidate ESP 
feagments may be isolated by cloning, arrayed, and subsequently validated by testing 
30 various sample DNAs (e.g. the various sample DNAs used as starting material for the 
hybridization-selection). Candidate ESP probe fragments that 253pear to detect 
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monomoiphic sample fragments may either be removed from the array or retained as 
control elements on the array. An alternative approach consists of performing the two 
reciprocal hybridization-selections, cloning the selected fragments, and identification 
of ESPs by means of matching S4- and S- fragments. The latter strategy is outlined 
below. 

(i) Preparation of S+ and S- fragments The preferred starting 
'^inat^MJ^^iLJp*!^^ mixture of g^enomi^^NA from a number of rqsresentative 
individuals. Such individuals (ranging from 5 to 50) may be chosen from various 
CEPH (Centre d'Etude du Polymorphisme Humain) pedigrees [Wang, D. G. et al. 
Science 280:1077-1082 (1998)]. Following cleavage of the DNA mixture with the 
PacI/Bfal-combination of sampling enzymes, appropriate oligonucleotide adapters as 
described above are ligated to the fragmCTt ends. This template DNA is divided in two 
aliquots and treated sqparately to prqjare ieq)ectively the S+ and S- fragment mix. To 
prepare the S- fragment mix, the target DNA fragments are cleaved with the probing 
enzyme and then amplified. This will result in a mixture of fragments that do not 
contain sites for the probing enzyme. Furthermore, the S- fragment mixture may be 
prg>ared by usmg one biotinylated primer, such that the resulting PGR product can be 
captured onto a solid substrate, such as magnetic beads conjugated with streptavidin. 
S-h fragments are prepared by (1) amplifying the mixture of Pacl-Bfal fragments, (2) 
digesting the PGR product with one of the four NGGN-recognizing enzymes, (3) 
ligating appropriate adapters to the ends generated by the probing enzyme (see EP 0 
534 858, incorporated herein by reference), and (4) re-amplification of the resulting 
material using one primer that recognizes the probe enzyme adapter and one primer that 
recognizes one specific sampling enzyme ad^ter. Similar to the S- fragments, the 
amplification reaction can be performed making use of a biotmylated primer that 
matches the probe enzyme adaptor such that the S+ fragment mixture can be 
immobilized. 

Two alternative pairs of Pad- and Bfal-adaptors, as well as 
corre^nding non-selective primers are used; e.g. set I is used for the amplification 
of the S- fragments and set n for the preparation of S+ fragments: 
Set I 
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5 ' - CTCGTAGACTGCGTACCCAT (SEQ ID NO: 17) 
3 ' - CATCTGACGCATGGG (SEQ ID NO : 18) 



5' -GACGATGAGTCCTGAG (SEQ ID NO: 3) 

3' -TACTCAGGACTCAT (SEQ ID NO: 4) 

5' -GACTGCGTACCCATTA (SEQ ID NO: 19) 

5' -GATGAGTCCTGAGTAG (SEQ ID NO: 10) 



Setn 

5 ' -GAGCATCTGACGCATGGGAT (SEQ ID NO: 20) 
3 ' -GTAGACTGCGTACCC (SEQ ID NO: 21) 

5 ' -CTGCTACTCAGGACTG (SEQ ID NO: 13) 

3 ' -ATGAGTCCTGACAT (SEQ ID NO: 14) 

5 ' - CTGACGCATGGGATTA (SEQ ID NO: 22) 

5 ' - CTACTCAGGACTGTAG (SEQ ID NO: 16) 

The adaptor ligated to the ends generated by the NCGN-cleaving probing enzyme and 
the corresponding amplification primer have the following structures: 

5' -GTCCTCATCGAGCATG (SEQ ID NO: 23) 

3 ' -AGTAGCrrCGTACGC (SEQ ID NO: 24) 

5 ' -CCTCATCGAGCATGCG (SEQ ID NO: 25) 

(ii) Hybridization-selection step(s) The S- fragment mix is 
hybridized to the biotinylated S+ fiagments. Following hybridization, the biotinylated 
products are captured onto strqjtavidin-coated magnetic beads. Tbs beads are 
iq)eatedly washed to remove aU unhybridized fiagments and thereafter the hybridized 
S- fiagments are eluted. These are thai reampMed with the Pad and Bfal primers and 
the hybridization-selection procedure is repeated at least once. Finally the amplified 
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fragments are cloned in an appropriate vector and a series of around 2,000 inserts are 
sequenced. To select a set of S-h fragments, this procedure is repeated in reverse using 
this time biotinylated S- ftagment. Upon comparison of the S -f- and S- sequences ESP 
fragments are readily identified as fragments having partially overlapping sequences 
and in which the S- fragment sequence shows a mutated NCGN restriction site at the 
internal boundary of the overlap. In this way, >500 ESPs are readily characterized. 

Step 2. Genetic analysis of humans using ESP prnhe fragments 
The sequence-verified ESP fragments are spotted on micro-arrays for 
genetic analysis of human sample DNA. For the preparation of this sample DNA, a pair 
of adaptors/primers is used that differs fi-om those attached to the arrayed S- or S+ set of 
ESP fi*agments. From each individual, an undigested control sample and a probe enzyme 
digested test sample are prepared. These samples are labeled with Cy3 and Cy5, mbced 
and hybridized to the micro-arrays as described before. Alternatively, the hybridization 
mixture may be composed of differentially labeled test DNA and previously genotyped 
control DNA, both digested with the probing endonuclease. In both cases, the Cy3 
(test/digested sample) and Cy5 (control/undigested DNA) signal intensities are normalized 
using a number of monomorphic control probes. The ratio of these normalized Cy3/Cy5 
signals for each of the ESP probes, allows accurate determination of the allelic state of the 
sample at each polymorphic site (homozygous S+/S+, homozygous S-/S-, heterozygous 
S+/S). 

The micro-array hybridization experiment may in the first place be 
performed with the sample DNAs, deriving fi-om a collection of individuals, firom which 
the ESP probe fi-agments were isolated. Such an experiment will, in the first place, confirm 
the polymorphic nature of the selected probe fi-agments and allow their testing in a 
multitude of measurements. The data will also yield information on the allele fi"equencies 
among an appreciable number of chromosomes. 
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Human genetic analysis us ing formaf-n W A A 
As described for com in Example 2, the format-I ESP assay for human 
genetic analysis may be converted to a format-H or a format-m assay. Based on the 
sequence of the selected and experimentally validated ESP fragments, it is indeed 
possible to design a pair of dedicated, i.e. ESP-specific, PCR primers. Such primers 
can be combined in a number of parallel multiplex reactions, which are in turn 
combined to obtain the sample DNA [Wang, D. G. et al.. Science 280: 1077-1082 
(1998)]. This sample DNA is hybridized against a micro-array of spotted S-l- ESP 
fragments (see to Example 3). The experiment is set up such that the fluorescently 
labeled ESP-specific primer and the S + sequences are located on opposite sides of the 
polymorphic site. Alternatively, the unlabeled ESP-specific amplification primers may 
be anayed as hybridization probes. The development of a fonnat-H or format-m assay 
need not be preceded by the identification of ESP fragments (using one of the methods 
described in the previous examples). In the present example, we describe the 
developmait of an RAA assay based on the sequence of previously discovered SNPs. 

Close inq)ecdon of the known SNPs reveals that a significant percentage 
of Aem are associated witii both the loss and gain of a restriction recognition site, i.e. 
each of two allehc sequences is associated with a different restriction recognition site. 
The single nucleotide substitution may inter-convert recognition sequences that are 
identical excqjt for one nucleotide [e.g. PM (GACTC) and Hgal (GACGC), Hgal and 
SfaNI (GATGC), SfaNI and Bbvl (GCTGC)]. Alternatively, the alleUc recognition 
sites may be partiaUy overlapping [e.g. MaeD (ACGTg) and NlalH (aCATG); in the 
latter case the inter-conversion dqjends on the nature of the upstream or downstream 
sequences). Such mutually exclusive restriction site allelism offers a distinct advantage. 
The RAA technique will nonnally only detect the allele that is devoid of a recognition 
ate for the probing enzyme; therefore, determination of the zygosity requires careful 
calibration of the signal against that observed with undigested control DNA. When each 
allele is associated with the presence/absence of a restriction site, two paiaUel RAA- 
assays can be performed, each involving digestion with one of the alternative enzymes. 
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With such an assay, both alleles can be positively identified and the zygosity is readily 
determined. The two parallel assays are best perfonned in a two-color mode; one of 
the primers is differentially labeled (e.g. with Cy3 and Cy5 as described previously) 
such that the amplification reactions can be mixed and hybridized against a single array 
of probes. 

We have systematically explored the SNP database of the Whitehead 
Institute for mutational changes that promote restriction site inter-conversions and have 
calculated their occurrence frequency. Two SNP-associated recognition site inter- 
conversions were found to occur at high frequency: Maell - > Nlain and Hgal - > 
SfaNI. In both cases the mutational changes converting one site into another are C->T 
(or G-^ A) transitions occurring in CpG dinucleotides. This finding is entirely consistent 
with the fact that this type of mutation occurs with a 6-8 times higher fi-equency than other 
nucleotide substitutions. Based on the number of SNPs found in the Whitehead database, 
we estimate the total number of SNPs in the human genome for the enzyme pairs 
Maell/Nlain and Hgal/SfaNI at respectively 30,000 and 15,000. These numbers are 
presumably somewhat overestimated since both Maell and Hgal are susceptible to CpG 
methylation. Consequently the inter-conversion can only be measured at the non- 
methylated sites. Therefore, in practice, RAA assays designed on the basis of sequence 
data should be validated on a number of test samples. Assays in which no cleavage takes 
place at the CpG-containing site in none of the individuals tested, should be eliminated 
firom the RAA bi-allelic marker systems. 

The foregoing examples are illustrative of the invention and are not intended to be limit 
the scope of the invention as set out in the claims. All of the references cited herein are 
incorporated by reference. 
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WE CLAIM: 

1. A method for detecting an endonuclease site p>olymoiphism (ESP) in 
DNA, the method comprising: 

(a) isolating sample DNA; 

(b) deriving ^ set of concomitantly amplifiable target DNA fragments 
from the sample DNA; 

(c) treating the target DNA fragments obtained in step (b) with a 
probe restriction endonuclease reagent; 

(d) amplifying the probe restriction endonuclease reagent treated 
target DNA fragments of step(c); 

(e) analyzing the DNA of step (d) to determine which target 
fragmrats are amplified and/or wMch target fragments are not amplified; and wherein 
target DNA fragments which are amplified lack a recognition site for the probe 
restriction «idonuclease reagent and target fragments having a recognition site for the 
probe restriction endonuclease reagent are not amplified. 

2. The method of claim 1 the concomitantly amplifiable target DNA 
ftagment of st&p (b) are derived by treatment of the sample DNA with a sampling 
restriction endonuclease reagent- 

3. The method of claim 2 wherein the concomitantly amplifiable DNA 
ftagments of step (b) are derived from sample DNA by treatment of the sample DNA 
with a first and a second restriction endonuclease reagent. 

4. The method of claim 3 wherein said first restriction endonuclease 
reagent has a recognition sequMce of sbc or more nucleotides and the second restriction 
endonuclease reagents has a recognition sequence of four or fewer nucleotides, 

5. The method of claim 3 or 4 wherein said concomitantly amplifiable 
target DNA fragments are derived by step wise treatment of said sample DNA with the 
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6. TixQ method of claim 1 further comprising preparing of PGR primers 
which flank the endonuclease site polymoiphism (ESP) for use in amplifying said 
concomitantly amplifiable target DNA ftagments. 

7. The method of claims 1, 2, 3, and 4 wherein the concomitantly 
amplifiable DNA fragments are modified by ligation of adapters to both termini of said 
fragments, and wherein said adaptors are capable of serving as primers for 
amplification. 

8. The method of claim 5 wherein the concomitantly amplifiable DNA 
fragmOTts are modified by ligation of adapters to both termini of said fragments, and 
wherein said adaptors are capable of serving as primers for amplification. 

9. The method of claim 1 wherein the probe restriction endonuclease 
recent of step (c) has a recognition sequence comprising six or more nucleotides. 

10. The method of claim 1 wherein the probe restriction endonuclease 
reagent of step (c) has a recognition sequence comprising four or more nucleotides. 

11. The method according to claim 1 wherein the probe restriction 
endonuclease of step (c) has a recognition sequence of two nucleotides. 

-12. Hie method according to claim 1 wherein the order of the steps (b) and 
(c) are reversed or carried out simultaneously. 

13. The method according to claim 1 wherein said endonuclease site 
polymorphism is an alteration in a concomitantly amplifiable target fragment giving 
rise to a nucleotide sequence that is recognized and cut by the probe restriction 
endonuclease reagent . 
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14. The method of claim 1 wherein said site polymorphism is an alteration 
in the nucleotide sequence of a concomitantly amplifiable target fragment which 
eliminates a recognition sequence for said probe restriction endonuclease reagent. 

15. The method of claims 1, 2, 3 and 4 wherein said concomitantly 
amplifiable DNA fragments are amplified by a polymerase chain reaction. 

16. The method of claim 5 wherein said concomitantiy amplifiable DNA 
fragments are amplified by a polymerase chain reaction. 



17. The method of claim 1 wherein amplified target fragments are 
identified by their ability to hybridize to cognate probe DNA fragments. 

18. A method for obtaining probe DNA fragments for use in detecting 
endonuclease site polymorphisms, the method comprising: 

(a) isolating sample DNA; 

(b) deriving a set of concomitantiy amplifiable target DNA fragments 
from the sample DNA; 

(c) sdecting from the taig^ DNA fragments , probe DNA fragments 
having an endonuclease site polymorphism (ESPs) for the probe restriction 
endonuclease. 

19. The method of claim 17 wherein said probe DNA fragments are derived 
by digestion of sample DNA with one or more sampling restriction endonuclease 
reagents. 

20. The method of claim 18 wherein probe DNA fragments are derived by 
digestion of a pool of sample DNAs obtained from one or more individuals of a 
species. 



21. The method of daim 18 whereiu the probe DNA fragments are derived 
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by digestion of a pool of sample DNAs obtained from 10 or more individuals of a 
species. 

22. The method of claim 18 wherein the probe DNA fragments derived by 
digestion of a pool of sample DNAs obtained from a pool of 50 or more individuals of 
species, 

23. The method of any one of claims 19-21 wherein said species is selected 
from the group consisting of procaryotic species and eucaiyotic species. 

24. A method for obtaining probe DNA ftagments for use in detecting 
endonuclease site jjolymoiphisms (ESP) comprising pn^aring synthetic 
oligonucleotides based on the nucleotide sequence of an5>lifiable target DNA fragments 
containing endonuclease site polymoiphism(s). 

25. A method for producing a microarray of probe DNA the method 
comprising: 

(a) isolating sample DNA; 

(b) deriving a set of concomitantly amplifiable target DNA fragments 
from the sample DNA; 

(c) selecting probe DNA fragments having restriction endonuclease site 
polymorphisms (ESPs) from the sample restriction endonuclease treated target DNA 
fragments of step (b); and 

(d) arraying the probe DNA fragments obtained in step (c) on a solid 
substrate in a predefined region by attaching the fragments to the substrate. 

26. The method of claim 24 wherein the DNA fragments of step (b) are 
obtained by treating sample DNA with one or more sample restriction endonuclease 
reagents. 



27. The method of claim 24 wherein the said probe DNA fragments of step 
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(d) are synthetic oligonucleotides which correspond to the concomitantly amplifiable 
target DNA fragments derivable from said sample DNA and containing an 
endonuclease site polymoiphism (ESP). 

28. The method of claim 25, 26 or 27 wherein the solid support is selected 
from a group consisting of a planar solid support, a bead, a sphere and a polyhedron. 

29. The method of claim 25 wherein the microarray comprises at least 2,000 
probe fragments. 

30. The method of claim 26 wherein the microarray comprises at least 2,000 
sythetic ologonucleotides. 

31 . The method of claim 27 wherein the micioanay comprises at least 2,000 
probe fragments. 

32. The method of claim 28 wherein the microarray comprises at least 2,000 
probe fragments. 

33. The method of claim 25 wherein the microarray comprises at least 
20,000 probe fragments. 

34. The method of claim 26 wherein the microarray comprises at least 
20,000 sythetic ologonucleotides, 

35. The method of claim 27 wherein the microarray comprises at least 
20,000 probe fragments, 

36. The method of claim 28 wherein the microarray comprises at least 
20,000 probe fragments. 
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SEQUENCE LISTING 



<110> METHEXIS N.V, 

<12 0> RESTRICTED AMPLICON ANALYSIS 

<130> 29314/34X58A 

<140> 
<141> 

<150> 60/107,293 
<151> 1998-11-09 

<160> 28 



<170> Patentin Ver. 2.0 





<210> 1 




<211> 17 




<212> DNA 




<213> Artificial Sequence 




<220> 




<223> Description of Artificial Se<iuence: primer 




<400> 1 


iL,5... 


ctcgtagact gcgtacc 




<210> 2 




<211> 18 




<212> DNA 




<213> Artificial Sequence 




<220> 




<223> Description of Artificial Sequence: primer 




<220> 




<223> As presented in the Sequence Listing the 



nucleotide sequence reads in the 5 ' to 3 ' 
direction. As presented in the specification the 
sequence reads in the 3' to 5 ' direction. 

<400> 2 

aattggtacg ,cagtctac 18 

<210> 3 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
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<400> 3 

gacgatgagt cctgag 



16 
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<210> 4 
<211> 14 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 * to 3 * 
direction. As presented in the specification the 
sequence reads in the 3 ' to 5 » direction. 

<400> 4 

tactcaggac teat 

<210> 5 
<211> 18 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<221> misc_feature 
<222> (17) 

<223> At position 17 N = A, C, G, or T 
<220> 

<22l> misc_feature 
<222> (18) 

<223> At position 18 N = A, C, G, or T 
<400> 5 

gactgcgtac caattcnn 

<210> 6 
<211> 19 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<221> misc_feature 
<222> (17) 

<223> At position 17 N = A, C, G, or T 
<220> 

<221> mis cofeature 
<222> (18) 

<223> At position 18 N = A, G, or T 
<220> 

<221> misc feature 
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<222> (19) 

<223> At position 19 N = A, C, G, or T 



<400> 6 

gatgagtcct gagtagnnn 

<210> 7 
<211> 21 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: primer 



<400> 7 

ctcgtagact gcgtacatgc a 21 

<210> 8 
<211> 14 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 ' to 3 ' 
direction. As presented in the specification the 
sequence reads in the 3' to 5 ' direction. 

<400> 8 

tgtacgcagt ctac 14 

<210> 9 
<211> 16 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: primer 
<400> 9 

gactgcgtac atgcag 16 

<210> 10 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 



<400> 10 

gatgagtcct gagtag 16 

<210> 11 
<211> 21 
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<212> DNA 

<213> Artificial Sequence 
<220> 

<22 3> Description of Artificial Sequence: primer 
<400> 11 

gagcatctga cgcatgttgc a 

<210> 12 
<211> 14 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 ' to 3 ' 
direction. As presented in the specification the 
sequence reads in the 3' to 5 • direction. 

<400> 12 
acatgcgtca gatg 

<210> 13 
<;211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 13 

ctgctactca ggactg 

<210> 14 
<211> 14 
<212> DNA 

< 2 13 > Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 » to 3 ' 
direction. As presented in the specification the 
sequence reads in the 3* to 5 * direction. 

<400> 14 
tacagtcctg agta 

<210> 15 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
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<220> 

<223> Description of Artificial Sequence: primer 
<400> 15 

ctgacgcatg ttgcag 

<210> 16 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 16 

ctactcagga ctgtag IS 

<210> 17 
<211> 20 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 17 

ctcgtagact gcgtacccat 20 

<210> 18 
<211> IS 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 ' to 3 ' 
direction. As presented in the specification the 
sequence reads in the 3» to 5 * direction. 

<400> 18 

gggtacgcag tctac 15 

<210> 19 - 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 19 

gactgcgtac ccatta 16 



<210> 20 
<211> 20 
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<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence; primer 



<400> 20 

gagcatctga cgcatgggat 20 

<210> 21 
<211> 15 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 ' to 3 * 
direction. As presented in the specification the 
sequence reads in the 3 ' to 5 ' direction . 



<220> 

<223> Description of Artificial Sec[uence: primer 
<400> 21 

cccatgcgtc agatg 15 

<210> 22 
<211> 16 
<212> DNA 

c213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 22 

ctgacgcatg ggatta 16 

<210> 23 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Seejuence: primer 
<400> 23 

gtcctcatcg 'agcatg 16 

<210> 24 
<211> 14 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 ' to 3 ' 
direction. As presented in the specification the 
sequence reads in the 3' to 5' direction. 
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SEQUENCE LISTING 

<110> METHEXIS N.V. 

<12 0> RESTRICTED AMPLICON ANALYSIS 

<130> 29314/34158A 

<140> 
<141> 

<150> 60/107,293 
<151> 1998-11-09 

<160> 28 

<170> Patentin Ver . 2.0 

<210> 1 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 1 

ctcgtagact gcgtacc 

<210> 2 
<211> 18 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 



<220> 

<223> As presented in the Sequence Listing ttie 
nucleotide sequence reads in the 5 ' to 3 ' 
direction. As presented in the specification 
sequence reads in the 3' to 5 • direction. 

<400> 2 

aattggtacg cagtctac 

<210> 3 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 3 

gacgatgagt cctgag 

<210> 4 
<211> 14 
<212> DNA 

<213> Artificial Sequence 



wo 00/28081 
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<220> 

<223> Description of Artificial Sequence: primer 

<400> 24 
cgcatgctcg atga 

<210> 25 
<211> 16 
<212> DNA 

<213 > Artif icial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 25 

cctcatcgag catgcg 16 

<210> 26 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 26 

gagcatctga cgcatcc ^7 

<210> 27 
<211> 18 
<2X2> DNA 

<213> Artificial Sequence 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 ' to 3 ' 
direction. As presented in the specification the 
sequence reads in the 3' to 5 ' direction, 

<220> 

<223> Description of Artificial Sequence: primer 
<400> 27 

aattggatgc gtcagatg IS 

<210> 28 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 



<400> 28 

ctgacgcatc caattc 



16 
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<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5' to 3 ' 
direction. As presented in the specification the 
sequence reads in the 3' to 5 ' direction. 

<400> 4 

tactcaggac teat 14 

<210> 5 
<211> 18 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<221> misc_feature 
<222> (17) 

<223> At position 17 N = A, C, or T 
<220> 

<221> misc_feature 
<222> (18) 

<223> At position 18 N = A, C, G, or T 
<400> 5 

gactgcgtac caattcnn 18 

<210> 6 
<211> 19 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<221> misc_feature 
<222> (17) 

<223> At position 17 N = A, C, G, or T 
<220> 

<221> misc_feature 
<222> (IS) 

<223> At position 18 N = A, G, or T 
<220> 

<221> misc_feature 
<222> (19) 

<223> At position 19 N = A, C, G, or T 
<400> 6 

gatgagtcct gagtagnnn 19 



<210> 7 
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<211> 21 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 7 

ctcgtagact gcgtacatgc a 21 
<210> 8 
<211> 14 
<212> DMA 

<213> Artificial Sequence 
<220-> 

<223> Description of Artificial Sequence: primer 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 ' to 3 ' 
direction. As presented in the specification the 
sequence reads in the 3' to 5 ' direction. 

<400> 8 

tgtacgcagt ctac 14 

<210> 9 
<211> 16 
<212> DNA 

<213> Artificial Sequence 

<220> 

<223> Description of Artificial Sequence: primer 
<400> 9 

gactgcgtac atgcag 16 

<210> 10 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 10 

gatgagtcct gagtag 16 

<210> 11 
<211> 21 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 11 

gagcatctga cgcatgttgc a 21 



<210> 12 
<211> 14 
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<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5' to 3 ' 
direction. As presented in the specification the 
sequence reads in the 3^ to 5 ' direction. 

<400> 12 

acatgcgtca gatg 14 
<210> 13 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 13 

ctgctactca ggactg 16 

<210> 14 
<211> 14 
<212> DMA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 ' to 3 ^ 
direction. As presented in the specification the 
sequence reads in the 3' to 5 ' direction . 

<400> 14 

tacagtcctg agta 14 

<210> 15 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 15 

ctgacgcatg ttgcag 16 

<210> 16 
<211> 16 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: primer 
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<400> 16 

ctactcagga ctgtag 16 

<210> 17 
<211> 20 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 17 

ctcgtagact gcgtacccat 2 0 

<210> 18 
<211> 15 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 ' to 3 ' 
direction. As presented in the specification the 
sequence reads in the 3' to 5 ' direction, 

<400> 18 

gggtacgcag tctac 15 

<210> 19 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<22 3> Description of Artificial Sequence: primer 
<400> 19 

gactgcgtac ccatta 16 

<210> 20 
<211> 20 
<212> DHA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 20 

gagcatctga cgcatgggat 2 0 

<210> 21 
<211> 15 
<212> DNA 

<213> Artificial Sequence 
<220> 

<22 3> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5' to 3 ' 
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direction. As presented in the specification the 

sequence reads in the 3' to 5 • direction. 

<220> 

<223> Description of Artificial Sequence: primer 



<400> 21 

cccatgcgtc agatg 15 

<210> 22 
<211> 16 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: primer 
<400> 22 

ctgacgcatg ggatta 16 

<210> 23 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 



<400> 23 

gtcctcatcg agcatg 16 

<210> 24 
<211> 14 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5 ' to 3 * 
direction. As presented in the specification the 
sequence reads in the 3' to 5 ' direction. 

<220> 

<223> Description of Artificial Sequence: primer 



<400> 24 

cgcatgctcg atga 14 

<210> 25 
<211> 16 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: primer 



<400> 25 

cctcatcgag catgcg 16 

<210> 26 
<211> 17 
<212> DNA 
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<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 
<400> 26 

gagcatctga cgcatcc 17 

<210> 27 
<211> 18 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> As presented in the Sequence Listing the 
nucleotide sequence reads in the 5' to 3 ' 
direction. As presented in the specification the 
sequence reads in the 3' to 5' direction. 

<220> 

<223> Description of Artificial Sequence: primer 
<400> 27 

aattggatgc gtcagatg 18 

<210> 28 
<211> 16 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: primer 



<400> 28 
ctgacgcatc 



caattc 



16 



